WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATIO N TREATY (PCT) 

WO 95/17205 



(51) International Patent Classiacation 6 : 

A61K 38/17, C07K 7/00, 14/435, 16/00, 
C12N 15/09, 15/12, 15/63, 15/70, 15/81 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



29 June 1995 (29.06.95) 



(21) International Application Number: PCT7US94/ 14356 

(22) International Filing Date: 13 December 1994 (13,12.94) 



(30) Priority Data: 

08/171,382 



21 December 1993 (2142.93) US 



(60) Parent Application or Grant 

(63) Related by Continuation 
US 

Filed on 



08/171,382 (CIP) 
21 December 1993 (21.12.93) 



(71) Applicant (for all designated States except US): IMMUNOBI- 

OLOGY RESEARCH INSTITUTE, INC. [US/US]; Route 
21 East, P.O. Box 999, Annandale, NJ 08801-0999 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): HARRIS, Crafford, A. 
[US/US); 132 Parsons Street, Easton, PA 18042 (US). 
GOLDSTEIN, Gideon [US/US]; 30 Dorison Drive, Short 
Hills, NJ 07078 (US). SIEKIERKA, John, J. [US/US]; 
10 Glenview Drive, Towaco, NJ 07082 (US). TALLE, 
Mary, Anne [US/US]; 620 South Randolphville Road, Pis- 
cutaway, NJ 08854 (US). SHENBAGAMURTH1, Ponniah 
[IN/US]; 250 Riverview Drive, Bridgewater, NJ 08807 



(US). CULLER, Michael, D. [US/US]; 613 Paxinosa Av- 
enue, Easton, PA 18042 (US). SETCAVAGE, Diane, R. 
[US/US]; 111 Fairview Avenue, Milford, NJ 08848 (US). 

(74) Agents: BAK, Mary, E. et al.; Howson and Howson, Spring 
House Corporate Center, P.O. Box 457, Spring House, PA 
19477 (US). 



(81) Designated States: AU, BR, CA, CN, CZ, H, HU, JP, KR, 
NO, NZ, PL, US, VN, European patent (AT, BE, CH, DE, 
DK, ES, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE). 



Published 

With international search report. 



(54) Title: RECOMBINANT HUMAN THYMOPOIETIN PROTEINS AND USES THEREFOR 
(57) Abstract 

The present invention provides novel nucleotide and amino acid sequences for human thymopoietin a 
recombinant^ expressing same, and diagnostic and therapeutic uses thereof. 



0, and 7, methods of 



nuxleo+ides 
9 3^ -10(0 % 



73 mctfches. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States 
applications under the PCT. 



AT 


Austria 


AU 


Australia 


BB 


Barbados 


BE 


Belgium 


BF 


Burkina Faso 


BG 


Bulgaria 


BJ 


Benin 


BR 


Brazil 


BY 


Belarus 


CA 


Canada 


CF 


Central African Republic 


CG 


Congo 


CH 


Switzerland 


CI 


Cdte d' Noire 


CM 


Cameroon 


CN 


China 


CS 


Czechoslovakia 


cz 


Czech Republic 


DE 


Germany 


DK 


Denmark 


ES 


Spain 


FI 


Finland 


FR 


France 


GA 


Gabon 



to the PCT on the front pages 



GB 


United Kingdom 


GE 


Georgia 


GN 


Guinea 


GR 


Greece 


HU 


Hungary 


IE 


Ireland 


IT 


Italy 


JP 


Japan 


KE 


Kenya 


KG 


Kyrgystan 


KP 


Democratic People's Republic 




of Korea 


KR 


Republic of Korea 


KZ 


Kazakhstan 


LI 


Liechtenstein 


LK 


Sri Lanka 


LU 


Luxembourg 


LV 


Latvia 


MC 


Monaco 


MD 


Republic of Moldova 


MG 


Madagascar 


ML 


Mali 


MN 


Mongolia 



pamphlets publishing international 



MR 


Mauritania 


MW 


Malawi 


NE 


Niger 


NL 


Netherlands 


NO 


Norway 


m 


New Zealand 


PL 


Poland 


PT 


Portugal 


RO 


Romania 


RU 


Russian Federation 


SD 


Sudan 


SE 


Sweden 


SI 


Slovenia 


SK 


Slovakia 


SN 


Senegal 


TD 


Chad 


TG 


Togo 


TJ 


Tajikistan 


TT 


Trinidad and Tobago 


UA 


Ukraine 


US 


United States of America 


UZ 


Uzbekistan 


VN 


Viet Nam 



WO 95/17205 



PCT/US94/14356 



1 

RECOMBINANT HUMAN THYMOPOIETIN PROTEINS AND USES THEREFOR 

Field of the Invention 

The present invention relates generally to human 
5 thymopoietin proteins and their use in diagnosis and 

therapy of various immune and nervous system conditions. 

Background of the Invention 

Thymopoietin is a polypeptide produced by cells of 

10 the thymus and other cells, which has been implicated in 
various immune and nervous system pathways. There have 
been several attempts to isolate and sequence various 
species of thymopoietin. Thymopoietin was originally 
isolated as a 5 KDa, 49 amino acid protein from bovine 

15 thymus [Goldstein et al, Nature , 247 ; 11-14 (1974). See 

also, Schlesinger and Goldstein, Cell , J5:361-365 (1975).] 
Later worJc described by T. Audhya et al, Biochemistry , 
.20(21) : 6195-6200 (1981) purported to provide the complete 
sequences for bovine thymopoietins. Three 49 amino acid 

20 sequences were described therein. Zevin-Sonkin et al, 

Immunol. Lett. , 3JL: 301-310 (1992) report the isolation of 
a bovine cDNA using oligonucleotide probes based on the 
original 49 amino acid bovine TP protein sequence 
[Schlesinger and Goldstein, cited above], which encodes 

25 the originally determined sequence at the N-terminus of a 
larger open reading frame. 

The active site of thymopoietin, a pentapeptide of 
the sequence Arg-Lys-Asp-Val-Tyr [SEQ ID NO: 7], was 
described by G. Goldstein et al, Science . 204 : 1309-1310 

30 (1979) and in U.S. Patent No. 4,190,646. There is a 
wealth of art describing analogs of the active site, 
termed thymopentin and their uses. 

Attempts to isolate and sequence thymopoietin 
continue. For example, European Patent Application 
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502,607 describes bovine thymopoietin or thymopoiet in- 
like cDNA clones. 

Despite these publications and the knowledge of 
thymopoietin, to date, the cloning of the complete human 
5 thymopoietin gene and its recombinant expression has not 
been described. There remains a need in the art for a 
convenient method of producing human thymopoietin, 
fragments thereof, and polynucleotide sequences encoding 
the protein. 

10 

summary of the In vention 

In one aspect, the invention provides three novel 
polynucleotide sequences encoding human thymopoietin 
proteins referred to as a, 0 and y, isolated from other 

15 cellular materials with which they are naturally 

associated, and having a biological activity associated 
with immune function. These polynucleotide sequences are 
illustrated in Fig. 1 [SEQ ID N0:1], Fig. 2 [SEQ ID N0:3] 
and Fig. 3 [SEQ ID NO: 5]. Fragments of these sequences 

20 are also embodied by this invention. These sequences or 
fragments thereof may also be optionally associated with 
conventionally used labels for diagnostic or research 
use. 

In another aspect, the invention provides an 
25 expression vector which contains at least a 

polynucleotide sequence described above. In still 
another aspect, a host cell transformed with such an 
expression vector is provided. 

In still another aspect, the present invention 
30 provides a method for producing a recombinant human 

thymopoietin protein which involves transforming a host 
cell with an expression vector containing a recombinant 
polynucleotide encoding a human thymopoietin protein by 
incubating the host cell and expression vector, and 
35 following transformation, culturing the transformed host 
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cell under conditions that allow expression of the human 
thymopoietin . 

In still another aspect, the present invention 
provides three proteins characterized by having activity 
in the immune system. These proteins are illustrated in 
Figs. 1-3, and are designated herein as a SEQ ID NO: 2, 0 
SEQ ID NO: 4, and y SEQ ID NO: 6, respectively. These 
proteins are characterized by being isolated from the 
cellular material with which they are naturally 
associated. Advantageously, one or more of these 
sequences is capable of being produced recombinants . 

In yet another aspect, the present invention 
provides a pharmaceutical composition containing at least 
one of the thymopoietin proteins a, 0 or y, and a 
15 pharmaceutically acceptable carrier. 

In another aspect, the invention provides a method 
of treating a subject with a disorder of the immune or 
nervous system by administering to the subject a 
pharmaceutical composition of the invention. 
20 in yet a further aspect, the invention provides a 

diagnostic reagent, such as a polyclonal or monoclonal 
antibody generated by use of one of these thymopoietin 
proteins or fragments thereof. 

In another aspect, the invention provides a 
25 diagnostic reagent, such as a DNA probe, i.e., an 

oligonucleotide fragment derived from the polynucleotide 
sequence encoding one of the proteins of the invention or 
from the complementary strand. The reagents may be 
optionally associated with a detectable label. 
30 other aspects and advantages of the present 

invention are described further in the following detailed 
description of the preferred embodiments thereof. 



35 
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Rt-ii pf De^rrription nf the Drawings 

Figs. 1A-1D illustrates the continuous nucleic acid 
[SEQ ID NO:l] and amino acid [SEQ ID NO-.2] sequences of 
human thymopoietin a. 
5 Figs. 2A-2C illustrates the continuous nucleic acid 

[SEQ ID NO:3] and amino acid [SEQ ID NO:4] sequences of 
human thymopoietin jS. 

Figs. 3A-3C illustrates the continuous nucleic acid 
[SEQ ID NO: 5] and amino acid [SEQ ID NO: 6] sequences of 
10 human thymopoietin y. 

Fig. 4A provides a schematic diagram of the protein 
sequence of thymopoietin protein o. The solid portion 
(aa 1-187) represents the sequence common to a, ^ and y. 
The vertically lined portion (aa 187-693) represents the 

15 a-specific domain. 

Fig. 4B provides a schematic diagram of the protein 
sequence of thymopoietin protein 0. The solid portion 
(aa 1-187) is common to a, ^ and y. The shaded portions 
(aa 187-220 and 329-453) are common to 0 and y. The 
20 diagonally lined portion (aa 220-329) is the ^-specific 
domain. There is a potential hydrophobic membrane- 
spanning domain at the C-terminal end of the protein. 

Fig. 4C provides a schematic diagram of the protein 
sequence of thymopoietin protein y. The solid portion 
(aa 1-187) is common to a, JS and y. The shaded portion 
(aa 187-344) is common to 0 and y. The C-terminal end 
has a potential hydrophobic membrane-spanning domain. 

Fig- 5 provides a genomic map of the human TP gene. 
The exons are numbered 1 through 8 and indicated on the 
linear diagram as either solid or clear boxes (see 1, 4 
and 8) or as solid black lines (see 2, 3, 5, 6 and 7). 
The protein coding regions of the exons are the solid 
lines/boxes. The clear boxes do not encode proteins. 



25 



35 
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Introns and flanking regions are indicated by the 
horizontal line. Xntron lengths in 3cbp (some 
approximate) are shown above introns. Sites for 
restriction enzymes BamHI (B) , Eco RI (R) , Pme I (P) , 
5 and SacII (S) are shown. The Eco Rl site marked with an 
asterisk is not present in A.SHG-1 DNA. The orientation 
of the PI. 517 insert is shown by the positions of the Not 
I (N) and Sfi I (Sf) sites in the flanking vector 
sequences. Approximate positions of Alu repeats are 
LO identified (A) • The guestion marks indicate unsequenced 
or uncloned regions. 

Figs* 6A-6C provide a continuous fragment of the TP 
gene [SEQ ID NO: 12] including exon 1 (which starts 5 1 at 
a nucleotide between numbers 1888 and 1901 through 
15 nucleotide 2479), a 5 f UTR (nucleotides between numbers 
1888 and 1901 through nucleotide 2200) and intron 1 
(nucleotides 2480-2509) . 

Fig. 7 provides a fragment of the TP gene [SEQ ID 
NO: 13] which includes exon 2 (nucleotides 120-246). 
2 0 Fig. 8 provides a fragment of the TP gene [SEQ ID 

NO: 14] which includes exon 3 (nucleotides 130-288). 

Figs. 9A-9G provide a continuous fragment of the TP 
gene [SEQ ID NO: 15] which includes exon 4 (which spans 
nucleotides 39 to a nucleotide between numbers 1956 and 
25 2843) and exon 5 (nucleotides 4691-4788) . 

Fig. 10 provides a fragment of the TP gene [SEQ ID 
NO: 16] which includes the 3 1 end of exon 6 (nucleotides 
82-128) . 

Figs. 11A-11C provide a fragment of the TP gene [SEQ 
30 ID NO: 17] which includes the 3 1 end of exon 6 * 

(nucleotides 1-54) , exon 7 (nucleotides 1357-1445) , and a 
partial sequence for exon 8 (nucleotides 2572-3234). 

Figs. 12A-12C are the nucleic acid sequence of the 
5' end of the human TP gene [SEQ ID NO: 18] . The two 
35 major apparent transcription initiation sites determined 
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by primer extension are marked with asterisks 
(nucleotides +1 and +14). The primer used is underlined 
and labeled. Sequences identical or closely similar to 
binding site seguences for known transcription factors 
5 are indicated by underlining and the name of the 

transcription factor* A direct repeat is underlined with 
half arrows pointing in the same direction. An inverted 
palindrome is underlined with half arrows pointing in 
opposite directions. 

10 Figs. 13A-13B are the nucleic acid sequence of 3* 

untranslated and 3* flanking regions of TP a mRNAs [SEQ 
ID NO: 19] . An asterisk marks the stop codon at the end 
of the TP a protein-coding region. Numbering is from the 
upstream major apparent mRNA start site (+1) . The AATAAA 

15 [SEQ ID NO; 45] (AAUAAA [SEQ ID NO: 46]) sequence 
upstream of the major poly-adenylation site is 
underlined, as is an ATTAAA [SEQ ID NO: 47] (AUUAAA [SEQ 
ID NO: 48]) sequence upstream of the minor poly- 
adenylation site. The GAACAGTG (A/T) TGT [SEQ ID NO: 20] 

20 sequence present just downstream of the A(T/A) TAAA [SEQ 
ID NOS: 45 and 47] sequence at both poly-adenylation 
sites is also underlined. The G at the 5* end of this 
conserved sequence is the last gene^encoding nucleotide 
preceding the poly-A sequences in the mRNAs > 

25 

Detailed Description of the Invention 

The present invention provides novel recombinant 
human thymopoietin (rhTP) nucleic acid sequences and 
proteins, designated a, p, and y* These sequences are 
30 provided in Figs. 1-3 [SEQ ID NO: 1-6], respectively. 

Advantageous ly, the nucleic acid sequences are useful as 
diagnostic probes, in gene therapy, and in the production 
of thymopoietin proteins. The proteins are useful for a 
variety of therapeutic and diagnostic applications, as 
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well as for generation of other therapeutic and 
diagnostic reagents. 

In the figures, the sequences are numbered 
differently than in the Sequence Listing. Specifically, 
5 in the figures, the sequences have been numbered so that 
amino acid -fi is the amino terminal proline of mature TP 
and nucleotide 4-1 is the first nucleotide of the proline 
codon. The initial Met, its codon, and the 5* end of the 
sequences all are designated in negative numbers* This 

10 is indicative of the fact that the initiating methionine 
is removed co~translationally by methionine 
aminopeptidase [R. A. Bradshaw, Trends Biochem. Sci. . 
14:276-279 (1989)]* In contrast, due to the limitations 
of the Patent;In program, the Sequence Listing does not 

15 contain any negative numbers. Thus, in the Sequence 

Listing, the 5 1 non-coding region begins with positive 
numbers and the first amino acid is Met* Throughout this 
application, fragments of the sequences will be referred 
to as in the figures, with the numbers of the Sequence 

20 Listing following in brackets. 

As used herein, the term n & numbering system" 
reflects the fact that two common regions are shared by 
hTP/? and hTPy and are identified by reference to the 
amino acids of the hTPB protein. Because hTP£ has a 109 

25 amino acid insert (indicated in bold in Fig* 2) , 

discussed in detail below, between amino acids 220 and 
221 of hTPy, in the 0 -numbering system, amino acid 3 30 of 
hTP£ is equivalent to amino acid 221 of hTPy (subtraction 
of the 109 ^-specific amino acids results in correct 

30 numbering for y) * See Figs* 4B and 4C* 

The present invention provides the human 
thymopoietin a, p , and y proteins* These proteins are 
characterized by the amino acid sequences of Fig* 1-3, 
respectively. Human TPa is 693 amino acids in length 

35 [SEQ ID NO: 2] having a molecular weight of 75 kDa, hTP0 
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is a 453 amino acid protein [SEQ ID no: 4] having a 
molecular weight of 51 kDa; and human TPy is a 344 amino 
acid protein [SEQ ID NO: 6] having a molecular weight of 
39 KDa. 

5 TPs a, 0, and y have identical N~terminal domains 

through Glu 187 (indicated by an * in Figs. 1-3). This 
region is termed apy [amino acids 2~188 of SEQ ID NO: 2, 
4, 6]. See Figs. 4A-4C. After Glu 1a7 , TP a [SEQ ID NO;2] 
diverges from TPs p [SEQ ID NO: 4] and y [SEQ ID NO: 6]* 

10 This unique region from amino acid 188 through amino acid 
693 of hTPa [189-694 SEQ ID NO: 2] is termed simply a. A 
unique hTP£ region is found at amino acid 221 through 
amino acid 329 [222-330 of SEQ ID NO:4]. TPy differs 
from TPP only in missing the j9-specific domain containing 

15 amino acids 221-329 Of TP£ (222-330 Of SEQ ID NO:4]. The 
two regions common to hTP p and hTP y are from amino acid 
188-220 (£yl) [189-221 SEQ ID NO: 4] and from amino acid 
330-453 (/3y2) [331-454 SEQ ID NO: 4] f using the P 
numbering system. In regions where the amino acid 

20 sequences of TPs ot [SEQ ID NO:2], p [SEQ ID NO:4], and y 
[SEQ ID NO: 6] are identical , their nucleotide sequences 
are identical as well, consistent with their originating 
via alternative splicing of transcripts from a single 
gene- This gene has been localized to chromosome band 

25 12q22. This was confirmed by sequencing of genomic 
clones and fluorescence in situ hybridization to 
metaphase chromosomes. 

Included in this invention are fragments of the TP 
a, P and y proteins [SEQ ID NOS: 2, 4, €]. Preferably, 

30 these fragments are at least about 3 amino acids in 
length and are characterized by being biologically 
active. These fragments are desirable for use in 
generating therapeutic or diagnostic antibodies or for 
other diagnostic purposes. Particularly desirable are 

35 the following fragments which have been found to be 
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immunogenic sites. The following Table I makes use of 

the nomenclature above, e.g. afiy hTP t „ 52 relates to amino 

acids 1-52 of a, 0 and y [amino acids 2-53 of SEQ ID NO: 

2, 4 and 6]. 

5 TABLE I 

Peptides SEQ ID NOS: Peptides SEQ ID NOS: 

a$y hTP 1-52 a hTP 425 „ 443 

(2-53) 2, 4, 6 (426-444) 2 

10 apy hTP v19 a hTP 518 _ S38 

(2-20) 2, 4, 6 (519-539) 2 

aPy hTP^ 39 a hTP 604 _ 6?2 

(29-40) 2, 4, 6 (605-623) 2 

aPy hTP 40 _ 52 a hTP 188 . 1?7 

(41-53) 2, 4, 6 (189-198) 2 

a/?y hTP 29 . S0 a hTP 188 _ 2Q2 

(30-51) 2, 4, 6 (189-203) 2 

a0Y hTP 56 . 71 /J 7 1 hTP, 

(57-72) 2, 4, 6 (197-216) 4, 6 

25 a0y hTP 92 . 108 p hTP 247 ■ 

(93-109) 2, 4, 6 (248-266) 4 

a0y hTP 168 . 187 /? ^312-32* 

(169-188) 2, 4, 6 (313-330) 4 



15 



20 



30 



35 



<* hTP 233-253 ^Y 2 hTP 332-348 

(234-254) 2 (333-349) 4, 6 

a hTP 342-3 2 py2 hTP 397-412 

(343-363) 2 (398-413) 4, 6 



Also included in the invention are analogs of the a 
p, and y proteins provided herein. Typically, such 

40 analogs differ by only 1, 2, 3 or 4 codon changes. 
Examples include polypeptides with minor amino acid 
variations from the illustrated amino acid sequences of 
a, £ or y ( F igs- 1-3; SEQ ID NOS: 2, 4, 6); in 
particular, conservative amino acid replacements. 

45 Conservative replacements are those that take place 
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within a family of amino acids that are related in their 
side chains and chemical properties . 

Additionally, the a, ft, and y proteins [SEQ ID NOS: 
2, 4, 6] of the invention may be altered, for example to 
5 improve production or to confer some other desired 
property upon the protein* For example, the 
transmembrane region of the protein, identified herein, 
may be removed, fully or in part, to obtain a soluble 
form of the protein. Alternatively, a TP protein of the 

10 invention may be truncated or modified to prevent 

localization to the nucleus or into the nuclear membrane. 
For example, the TPa may be modified to remove the 
putative nucleus localization motif at amino acids 1B9- 
195 [aa 190-196 of SEQ ID NO:2]. The carboxy terminal 

15 transmembrane localization motifs of TPB and TPy can also 
be removed, e.g., at aa 411-431 [aa412-432 of SEQ ID NO: 
4] (indicated by double underlining in Figs. 2 and 3). 

Without being bound by the theory of the mechanism 
by which these rhTP proteins function, the inventors 

20 believe that each protein has unique characteristics » 

Each of the proteins plays a role in cellular physiology, 
especially in the immune system. As illustrated in the 
Examples below, TP mRNA expression was detected in all 
tissues examined, suggesting that some TP function (s) may 

25 be important in many or all cell types. However, TP mRNA 
expression was highest in adult thymus and in fetal 
liver, a major fetal site for production of T cell 
precursors. This suggests that TPs may play important 
roles in T cell development and function. 

30 Human TPs a, p, and y [SEQ ID NOS: 2, 4, 6] do not 

appear to contain a cleavable hydrophobic amino-terminal 
signal peptide for directing the nascent peptide into the 
ER/Golgi pathway for protein secretion. The apparent 
absence of classical N-terminal hydrophobic cleavable 

35 signal sequences for secretion in TP a, 0, and y suggests 
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that the proteins [SEQ ID NOS: 2, 4, 6] may be largely 
localized intracellularly and may have important 
intracellular functions. However, preliminary analysis 
of conditioned media from human and mouse T-cell lines 
5 using a TP immunoassay is consistent with the presence of 
one or more forms of extracellular TP* Extracellular TP 
may be generated by an alternative secretion pathway such 
as that used by interleulcin-l or the fibroblast growth 
factors, which also have no classical signal sequences 
10 [A* Rubartelli et al, Biochem. Soc. Trans, . 19:255-259 
(1991) ]. 

TPs p and y [SEQ ID NOS: 4 and 6] contain a 
hydrophobic domain near their carboxy termini, which may 
be a transmembrane signal-anchor domain ♦ This putative 

15 transmembrane region is found at amino acid sequences 

410-430, using the J9 numbering system [411-431 of SEQ ZD 
NO: 4]. In contrast, TP a [SEQ ID NO: 2] does not appear 
to contain a membrane-spanning domain and is expected to 
be a soluble protein. Preliminary analysis of 

20 subcellular localization by immunofluorescence microscopy 
confirms the localizations suggested above, i.e., TPB and 
TPy being localized to the nuclear membrane and TPa being 
localized within the nucleus. 

Examination of TP a, 0, and y sequences [SEQ ID NOS: 

25 2, 4, 6] for additional motifs revealed potential 

phosphorylation sites for several protein kinases. Of 
particular interest is a consensus sequence for tyrosine 
phosphorylation in TPa [SEQ ID NO: 2] at Tyr 626 (indicated 
by underlining in Fig. 1) . Typically, phosphorylation on 

30 tyrosine serves to regulate activities of many proteins, 
particularly proteins involved in controlling cell growth 
and differentiation. 

The nucleic acid sequences encoding these proteins 
are themselves useful for a variety of diagnostic and 

35 therapeutic uses, including gene therapy. Thus, the 
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present invention also provides the nucleic acid 
sequences encoding hTPa, /? and y [SEQ ID NOS: 2, 4, 6] 
and fragments thereof. The nucleic acid sequences of the 
invention are characterized by the DNA sequences of Fig. 
5 1-3 [SEQ ID NOS: 1, 3, 5], respectively . Note that the 
first approximately 53 nucleotides of the TPy sequence of 
Fig. 3 may either be an alternatively spliced original 
TPy sequence, or alternatively may represent a non-TP 
cloning artifact. 

L0 in addition to the fragments encoding the peptide 

sequences of Table I, other fragments of these sequences 
may prove useful for a variety of uses. Desirably , these 
fragments are at least about 15 nucleotides in length and 
encode a desired amino acid sequence, e.g. an epitope, a 

15 therapeutically useful peptide, or the like. These 

nucleotide sequences of the invention may be isolated as 
in Examples 1 or 3 , described below. Alternatively, 
these sequences may be constructed using conventional 
genetic engineering or chemical synthesis techniques. 

20 According to the invention, the nucleic acid 

sequences [SEQ ID NOS: 1, 3, 5] coding for, as well as 
the encoded a, /3, and y proteins [SEQ ID NOS: 2, 4, 6] 
described above and provided in Figs. 1-3, may be 
modified. Utilizing the sequence data in these figures, 

25 it is within the skill of the art to obtain other 

polynucleotide sequences encoding the proteins of the 
invention. such modifications at the nucleic acid level 
include, for example, modifications to the nucleotide 
sequences which are silent or which change the amino 

30 acids, e.g. to improve expression or secretion. 

Alternatively, the amino acid sequence may be modified to 
enhance protein stability or other characteristics, e.g. 
binding activity or bioavailability. 

in still another alternative, the polynucleotide 

35 and/or protein sequences may be modified by adding 
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readily assayable tags to facilitate quantitation, where 
desirable. Nucleotides may be substituted, inserted, or 
deleted by known techniques, including, for example, in 
vitro mutagenesis and primer repair. Also included are 
5 allelic variations, caused by the natural degeneracy of 
the genetic code. For example, in one of the hTPa cDNA 
clones isolated, nucleotide 1792 is a G, which changes 
amino acid 598 from Gin to Glu (compare to SEQ ID NO:l in 
which nucleotide 1792 is a C) . Note, also, nucleotide 
10 579 is C in the p clone XT. 6 and in a genomic clone, but 
T in the sequenced subclone of y clone XT. 206, in both 
cases encoding leucine. 

In addition to isolated nucleic acid sequences [SEQ 
ID NOS: 1, 3, 5] encoding the thymopoietin proteins a, 
15 and y [SEQ ID NOS: 2, 4, 6] described herein, this 

invention also encompasses other nucleic acid sequences, 
such as those complementary to the illustrated DNA 
sequences. Useful DNA sequences also include those 
sequences which hybridize under high or moderately high 
2 0 stringency conditions [see, T • Maniatis et al, Molecular 
Cloning (A laboratory Manual) , Cold Spring Harbor 
Laboratory (1982), pages 387 to 389] to the DNA sequences 
illustrated in Fig. 1-3. An example of a highly 
stringent hybridization condition is hybridization at 
25 4XSSC at 65 a c, followed by a washing in 0.1XSSC at 65°C 
for an hour. Alternatively, an exemplary highly 
stringent hybridization condition is in 50% formamide, 
4XSSC at 42 °C. Other, moderately high stringency 
conditions may also prove useful, e.g. hybridization in 
30 4XSSC at 55 °C, followed by washing in 0.1XSSC at 37 °C for 
an hour. Alternatively, an exemplary moderately high 
stringency hybridization condition is in 50% formamide, 
4XSSC at 30 e C. 
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Once constructed, or isolated, as described in 
further detail in Example 1 and 3 below , these DNA 
sequences or suitable fragments are preferably employed 
to obtain proteins of this invention. 
5 The DNA sequences of the invention are inserted into 

a suitable expression system to obtain the proteins of 
the invention* Desirably, the polynucleotide sequence is 
operably linked to a heterologous expression control 
sequence permitting expression of the human thymopoietin 

10 protein. Numerous types of appropriate expression 

systems are known in the art for mammalian (including 
human) expression, as well as insect, yeast, fungal, and 
bacterial expression, by standard molecular biology 
techniques. Bacterial expression systems, using such 

15 host cells as EL. coli, are desirable for expression of 
thymopoietin. 

Mammalian cell expression vectors are also desirable 
for expression. The mammalian cell expression vectors 
described herein may be synthesized by techniques well 

20 known to those skilled in this art* The components of 

the vectors, e*g. repl icons, selection genes, enhancers, 
promoters, and the like, may be obtained from natural 
sources or synthesized by known procedures. 

The transformation of these vectors into appropriate 

25 host cells can result in expression of the selected 
thymopoietin proteins. Other appropriate expression 
vectors, of which numerous types are known in the art for 
mammalian expression, can also be used for this purpose. 
Suitable cells or cell lines for this method are 

30 mammalian cells, such as Human 293 cells, Chinese hamster 
ovary cells (CHO) , the monkey COS-1 cell line or murine 
3T3 cells derived from Swiss, Balb-c or NIH mice* The 
selection of suitable mammalian host cells and methods 
for transformation, culture, amplification, screening, 

35 and product production and purification are known in the 
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art. [See, e.g., Gething and Sambrook, Nature, 293:620- 
625 (1981), or alternatively, Kaufman et al r Mol. Cell. , 
Biol. . 5 (7) :1750~1759 (1985) or Howley et al, U. S. 
Patent 4,419,446]. Another suitable mammalian cell line 
5 is the CV-1 cell line. 

Similarly useful as host cells suitable for the 
present invention are bacterial cells. For example, the 
various strains of JU. coli (e.g., HB101, MC1061, and 
strains used in the following examples) are well-known as 
10 host cells in the field of biotechnology. Various 

strains of JB»_ subtil is , Pseudomonas, other bacilli and 
the like may also be employed in this method. 

Many strains of yeast cells known to those skilled 
in the art are also available as host cells for 
15 expression of the polypeptides of the present invention. 

Additionally, where desired, insect cells may be utilized 
as host cells in the method of the present invention. 
[See, e.g. Miller et al, Genetic E ngineering, 8:277-298 
(Plenum Press 1986) and references cited therein]. 
20 Fungal cells may also be employed as expression systems* 
The host cells transformed with the one or more 
vectors carrying the thymopoietin DNA, e*g. by 
conventional means, may then be cultured under suitable 
conditions to obtain expression of the desired protein. 
25 The method of this present invention therefore comprises 
culturing a suitable cell or cell line, which has been 
transformed with a DNA sequence coding for thymopoietin, 
the coding sequence under the control of a 
transcriptional regulatory sequence. The expressed 
30 protein is then recovered, isolated, and purified from 
the culture medium (or from the cell, if expressed 
intracellularly) by appropriate means known to one of 
skill in the art. 
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For example, the proteins may be isolated following 
cell lysis in soluble form, or extracted in guanidine 
chloride. For example, a currently preferred method for 
purification of hTPa [SEQ ID NO: 2] is by lysis of the 
5 E. coli by freezing and thawing followed by sonication, 
and extraction of the recombinant protein with solutions 
containing 20mM Tris HC1, pH 7,6, 1M urea or 1M guanidine 
HC1. In addition, molecular sieving, e.g. using a 3 00 
kDa sieve [BioRad TSK-250] column, may be used. 

10 If desired, the TP proteins of the invention may be 

produced as a fusion protein. For example, it may be 
desirable to produce such TP fusion proteins, to enhance 
expression of the protein in a selected host cell, or to 
improve purification. Suitable fusion partners for the 

15 rhTP proteins of the invention are well known to those of 
skill in the art and include, among others, /?- 
galactosidase and poly-histidine. 

Other uses for the polynucleotide sequences of this 
invention include diagnostic and therapeutic uses. For 

20 example, the novel recombinant hTP nucleic acid sequences 
or genes of the invention, or suitable fragments thereof, 
are useful in gene therapy for correcting abnormalities, 
for example, those associated with an immune or nervous 
system disorder. 

25 Another example involves incorporating a desired hTP 

nucleic acid sequence of the invention into a suitable 
vector or other delivery system. Suitable delivery 
systems are well known to those of skill in the art. 
Vectors containing such sequences may be administered, 

30 thus, treating deficiencies of TP via in vivo expression 
of the proteins of the invention. Such delivery systems 
enable the desired hTP gene to be incorporated into the 
target cell and to be translated by the cell. In such a 
manner, a recombinant hTP protein of the invention can be 
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provided to a cell, particularly a cell in an individual 
having a condition characterized by a deficiency in TP. 

These polynucleotide sequences of this invention may 
also be associated with detectable labels or components 

5 of label systems conventionally used in diagnostic or 
therapeutic methods. As diagnostic agents the 
polynucleotide sequences may be employed to detect or 
quantitate normal or mutant hTP mfcNA or detect mutations 
in TP DNA in a patient sample. 
10 The TPa, p and y proteins [SEQ ID NOS: 2, 4, 6] of 

the invention and compositions containing these proteins 
demonstrate a variety of regulatory effects on the 
mammalian immune system. For example, peptides of this 
invention offer treatment therapies for chronic 
15 infection, autoimmune disorders, and certain affective 
psychiatric or neurological disorders, as well as other 
conditions characterized by a disorder of the immune 
system. Because of the immunomodulatory characteristics 
of the subject proteins, they are therapeutically useful 

20 in the treatment of humans, and possibly animals, since 
they are capable of effecting changes in the immune 
system of the mammal. 

These proteins have therapeutic uses in humans. For 
example, the rhTP proteins in a pharmaceutical 

25 composition of the present invention may be administered 
in vivo to raise levels of circulating TP in an 
individual requiring same, e.g., a patient suffering from 
disorders, e.g., stress related to insufficient levels of 
circulating hTP. Alternatively, the rhTP proteins of the 

30 invention may be administered in such a way as to produce 
a localized response. It is anticipated that these rhTP 
proteins will have longer half -lives than thymopentin. 

Also, the proteins according to the present 
invention may be used to diminish the effects of aging on 

35 the immune system. As the thymus shrinks with age, the 
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level of thymopoietin decreases* Thus, administration of 
proteins of this invention which have biological activity 
similar to thymopoietin can help reduce the effects of 
aging related to inefficient or non-functioning immune 
5 systems . 

The invention further provides pharmaceutical 
compositions and a method for treatment of conditions 
resulting from disorder of the immune system and/or 
nervous system of a subject, which comprises 

10 administering to said subject a therapeutically-ef fective 
amount of at least one of the proteins or pharmaceutical 
compositions of this invention. Such pharmaceutical 
compositions of the invention contain one or more of the 
above-described proteins or acid* or base-addition salts 

15 thereof. Optionally, such compositions may further 

contain conventional therapeutic or other agents useful 
in treating the immune or other disorder. The subject 
proteins or pharmaceutical compositions containing the 
proteins or their acid or basic salts are generally 

20 considered to be useful when cellular immunity is an 
issue and particularly when there are deficiencies in 
immunity. The pharmaceutical compositions of the 
invention are also useful in treating imbalances and 
dysfunctions in the central nervous system. 

25 As used herein, the term "therapeutically-ef fective 

amount" means an amount which is effective to treat the 
conditions referred to above. A protein of the present 
invention is generally effective when parenterally 
administered in amounts above about 0.01 \iq protein per 

30 kg of body weight (/ug/kg) , and preferably from about 1 
jiig/kg to about 10 mg/kg. 

To prepare the pharmaceutical compositions of the 
present invention, a protein of this invention is 
combined as the active ingredient in intimate admixture 

35 with a pharmaceutical carrier according to conventional 
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pharmaceutical compounding techniques. This carrier may 
take a wide variety of forms depending on the form of 
preparation desired for administration, e.g., sublingual, 
rectal, nasal, or parenteral. The presently preferred 
5 route of administration is parenteral* 

For parenteral products the carrier will usually 
comprise sterile water, although other ingredients may be 
included, e.g. , to aid solubility or for preservation 
purposes. Injectable suspensions may also be prepared, 

10 in which case appropriate liquid carriers, suspending 
agents, and the like may be employed • 

Both the nucleic acid and amino acid sequences of 
the invention are useful for generating reagents for use 
in diagnostic assays. The nucleic acid sequences, or 

15 suitable fragments thereof, are also useful for detecting 
thymopoietin mRNA levels, and gene mutations. Further, 
antibodies, including monoclonal, polyclonal, and 
recombinant antibodies, may be generated to these peptide 
sequences which may similarly be useful for measuring 

20 thymopoietin levels. Such monoclonal antibodies may be 
generated using the standard Kohler and Milstein 
technique as well as well known modifications thereof. 
Alternatively, other known techniques for the generation 
of monoclonal or recombinant antibodies may be employed 

25 using fragments of the proteins or polynucleotide 
sequences of this invention to generate antibodies 
suitable for both therapeutic and diagnostic application. 

Thus, the invention provides a method for diagnosing 
an immune or nervous system disorder, and/ or detecting a 

30 condition associated with increased or decreased levels 
of thymopoietin using conventional diagnostic assay 
methods, such a diagnostic method may be performed using 
a monoclonal or polyclonal antibody directed against an 
epitope of protein a, 0, or y, or a DNA probe of the 

35 invention, in an appropriate assay system* 
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The following examples illustrate the preferred 
methods for isolating and expressing the novel sequences 
of the invention. In view of the disclosure of these 
sequences, other methods for obtaining them are available 
5 to the art and are therefore encompassed in this 

invention. These examples are illustrative only and do 
not limit the scope of the invention. 

EXAMPLE 1 - ISOIATION OF HUMAN THYMOPOIETIN CDNA CLONES 

10 Initial human thymopoietin cDNA clones were isolated 

^ from a commercial cDNA library prepared from human thymus 
&NA in the vector lambda GT10 (Clontech? Palo Alto, CA) . 
The sequence of human thymopoietin a was determined from 
the overlapping cDNA clones XhTP-T.32 and AhTP~T.153, 

15 which together provide the complete open reading frame, 
and was verified in the genomic clone ASHG-l, obtained 
from a commercial genomic library in vector A.FIXII 
[Stratagene] . Isolation of the clones from which the TP 
proteins a, (3 and y of the invention were derived was 

20 performed as follows. 

The library was probed using two 95-mer 
oligonucleotides containing a 14 nucleotide overlap based 
on the bovine thymopoietin sequence of Zevin-Sonkin et 
al, Immunol . Lett . > 31:301-310 (1992). The sense 

25 oligonucleotide sequence was: GGGAATTCGC CGCCGAGATG 

CCGGAGTTCC TGGAAGACCC CTCGGTCCTG ACGAAAGAGA AGTTGAAGAG 
TGAGTTGGTC GCCAACAATG TGACG :SEQ ID NO: 8. The antisense 
oligonucleotide sequence was: GGGAATT CAG CGCTTCAGGG 
CCGTCAGGTG CTGCAGGTAG AGCTGCACAT ACACGTCTTT GCGCTGCTCC 

3 0 CCGGCCGGGA GCGTCACATT GTTGG: SEQ ID NO: 9. 

Clones AhTP-T.6 (hTP£) , AhTP-T.17 (hTP0) , and AhTP- 
T.32 (hTPa) were among the clones isolated in this 
initial screen. Clone A.hTP-T.153 (hTPa) was among the 
clones isolated in a subsequent screen in which the probe 
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was a 0.3 kb fragment isolated from the 3 * end of AhTP- 
T-32 by digestion with the restriction enzymes Bam HI and 
Eco HI * Clones AhTP-T*206 (hTPy) and AhTP-T,209 (hTPjff) 
were among the clones isolated in a screen in which the 
5 probe was two overlapping oligonucleotides derived from 
the 3* end of AhTP-TM7, the sense oligonucleotide being 
SEQ ID NO: 10 : TCTATCAAGC TATGGAAACC AACCAAGTAA ATCCCTTCTC 
TAATT and the antisense oligonucleotide being SEQ ID NO: 
11: CATTCAGTTG GATTTTCTAG GGTCAACATG AAGAGAATTA 

10 GAGAAGGGAT. 

The sequence of human thymopoietin y was determined 
from clone AhTP~T,206. The sequence of human 
thymopoietin 0 was determined from the overlapping clones 
XhTP-T,6, XhTP~T*17, and AhTP-T.209. The clone numbers 

15 ara based solely on the order of isolation from the 
library* 

EXAMPLE 2 - ANALYSIS OF TP CLONES 

Sequences were determined using seguenase Version 

20 2.0 (United States Biochemical) or Taq polymerase 
(Perkin-Elmer) , on the original clone DNA, or on 
fragments subcloned into plasmid vectors. All sequences 
reported here were determined on both strands of at least 
one clone, and, except for the 3' untranslated sequences 

25 of TPs and y, have been confirmed in one or more 
additional clones. 

The sequences of human TP a, £ and y are similar but 
not identical to the bovine sequence of Zevin-Sonkin et 
al, cited above, between amino acids 1-81, but show no 

30 further similarity beyond this point • Sequencing of the 
human TP gene [SEQ ID NO: 1,3,5] in a genomic clone has 
revealed that the DNA sequence encoding amino acid 81 
lies in the middle of an exon with no nearby potential 



WO 95/17205 



PCTAJS94/14356 



22 



splice donor sites, indicating that a TP containing C- 
terminal sequence similar to the bovine sequence is not 
produced from the human TP gene [SEQ ID NO: 1,3, 5]. 
Protein sequences were searched for motifs in 
5 release 9 of the Prosite database [A. Bairoch, Nucl. 
Rrnds Res. . 21:3097-3103 (1993)] using MacPattem [R. 
Fuchs, r™™^. AOP3 . Biosci.. 7:105-106 (1991)]. This 
analysis revealed several potential phosphorylation sites 
for protein kinases, including KTYDAASX, amino acids 619- 

10 626 of TPa [620-627 of SEQ ID N0:2], which matches a 

consensus sequence for phosphorylation by some tyrosine 
kinases ( [K/RIX^D/E]^) [T. Patschinsky et al, Proc, 
Acad, fici-. USA . 79_:973-977 (1982)]. 
Hydropathy analysis was performed by the method of 

15 D. M. Engelman et al, arm. Rev. Biophys. Biophys. Chem. , 
15:321-353 (1986) as implemented in MacVector (Eastman 
Kodak Chemical Co., software version 4.1) and revealed 
that TPs p and y [SEQ ID NOS: 4 and 6] contain a very 
hydrophobic region close to their carboxy termini that 

20 may function as a transmembrane domain. No compelling 

similarities to previously known protein or nucleic acid 
sequences other than TP were revealed. 

TRAMPLE 3 - TSOIATIOW AND ANA LYSIS OF GENOMIC CIONES 

25 ASHG-l was isolated from a human placenta genomic 

library in A Fix II (Stratagene) by hybridization with a 
mixture of probes containing nucleotides 1004 to 2273 of 
the human TP a cDNA sequence [SEQ ID NO:l] and 
nucleotides -189 to 1413 of the human TP y cDNA sequence 

30 [SEQ ID No: 5], by standard methods [Sambrook et al, cited 
above] . For isolation of bacteriophage PI clones PCR 
primers were designed from the TP a-specif ic region of 
the TP a cDNA sequence [SEQ ID NO:l] , and shown to 
amplify a product of the correct size from total human 
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genomic DNA, using PCR condition© as follows: initial 
denaturation at 95 °C fox* 2 min, followed by 30 cycles 
each at 95 °C for 1 min, 53 °C for 1 min, and 72 °C for 2 
min, in a Coy Lab Products thermocycler . PGR primers and 
5 sequences were SHG1F1 (5 1 -TCCTGTCTCCTCCAAGAAAAGTCC-3 1 , 
sense nucleotides 1262 to 1285 (SEQ ID NO: 21)) and 
SHG1B2 (5 1 -TGCTGTGATTGCTTAGCCACTTC-3 1 , antisense 
nucleotides 1442 to 1420, SEQ ID NO: 22) for clones 
PI. 515 and PI. 516, and SHG1F7 (5«- 

10 CCGAACTGATGTCTTCTTTTGCC-3 1 , sense nucleotides 1358-1380, 
SEQ ID NO; 23) andSHGlBS (5 • -ATCTGTGTTGCCTCCCTGGAAG-3 • , 
antisense nucleotides 1586 to 1565, SEQ ID NO: 24) for 
Clone F1.517. Clones PI * 515 (DMPC-HFF#1B-0606A) , PI. 516 
(DMPC-HFF#1B-0943F) , and PI. 517 (DMPC~HFF#1B-1120E) , 

15 cloned in pAdlOSacBII [Pierce et al, Proc . Natl . Acad . 
Sci. USA , 89:2056-2060 (1992)], were isolated at Genome 
Systems, Inc. from the DuPont Merck Pharmaceutical 
Company Human Foreskin Fibroblast Pi Library #1 Series B 
(compressed format) . Clones were initially characterized 

20 by partial restriction mapping and by hybridization with 
defined regions of TP cDNAs, and restriction fragments of 
interest were subcloned into pBluescript II KS(+) 
(Stratagene) for sequencing. Some regions were amplified 
by PGR and subcloned into pCR-script SK(+) (Stratagene) 

25 for sequencing. All exons and their borders were 

sequenced completely. The sequence encoding the 5' 
untranslated region of TP mRNAs and the sequence of the 
5 1 flanking region were determined on both strands. 

The restriction mapping and partial sequencing of 

30 the overlapping clones confirmed that TPs a, p, and y are 
produced via alternative mRNA splicing from the single 
gene. All exons and their borders were sequenced from 
genomic clones except for the 3* end of exon 8, 
containing part of the 3 1 untranslated region of 

35 thymopoietin p and y mRNAs, which has not yet been 
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isolated or sequenced from genomic or cDNA clones . The 
gene contains eight known exons spread over - 35 Kb. The 
amino terminal common region present in all three TPs at, 
p, and y is encoded by three exons* The a-specific 
5 domain and 3 ■ untranslated region of TP a mRNA are 

encoded by a single large exon. The a-specific exon is 
followed by a small exon encoding sequences common to TPs 
p and Yr the exon encoding the ^-specific domain, and 
then two additional exons encoding carboxy-terminal 
10 sequences common to TPs 0 and 

B. comparison of s e quenced coding and non-co ding 

re gions of the TP gene an d published Sequences 
Genomic sequences were aligned with cDNA sequences 
using the MacVector software package (Kodak) * Sequences 
15 were compared to nucleic acid and protein databases using 
the BLAST programs [Altschul et al, J. Mol. Biol-, 
215 :403-410 (1990)] of the National Center for 
Biotechnology Information via the Internet. To search 
the 5* flanking region of the gene for candidate 
20 sequences similar to known binding sites for 

transcriptional regulatory proteins the sequence was 
analyzed with the FINDPATTERNS program of version 7 of 
the GCG software package (Genetics Computer Group, 
Program manual for the GcG package, Version 7, April 1991 
25 (1991)) using the TFD transcription factor database 

[Ghosh, trends Biochem. Sci. . 16x445-447 (1991)], and 
with MacVector using the TFD database and additional sets 
of nucleic acid motifs. 

The comparison did not reveal any compelling 
30 sequence similarities except for common repetitive 

sequences such as Alu repeats. Upstream of the TP gene, 
however, on the antisense strand, nucleotides -853 to 
-915 were found to be identical to Genbank sequence 
D19953, a 63 nucleotide sequence from a cDNA clone picked 
35 at random from a human promyelocyte cDNA library. 
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TP p shares many characteristics with the recently 
described rat liver inner nuclear membrane protein LAP2 
[Foisner and Gerace, Cell, 73:1267-1279 (1993); Furukawa 
etal, Molec. Biol. Cell. , 5S=341a (1994)], including 
5 localization to the inner nuclear membrane, nearly 

identical size (453 and 452 amino acids), identically 
located single C-terminal transmembrane domain at amino 
acids 410-433, in vivo phosphorylation, and the presence 
of sequence motifs for phosphorylation by CDC2 (in the 

10 case of TP p, amino acids 255 to 260, RTPRKR (SEQ ID 

NO: 4)) . Comparison of the human TP p sequence with the 
unpublished rat IAP2 sequence revealed -90% identity, 
which strongly suggests that TP p is the human homologue 
of IAP2 . IAP2 has been shown to bind lamin Bl and 

15 chromosomes in a manner regulated by its phosphorylation, 
and has been proposed to play key roles in attaching the 
inner nuclear membrane to the nuclear lamina and 
chromosomes, and in promoting reassembly of the nuclear 
envelope at the end of mitosis* 

20 

EXAMPLE t - HUMAN GENOMIC flfiTTTHFRN BLOT 

The TP gene was detected on a human genomic Southern 
blot obtained from Clontech with a probe prepared from 
overlapping complementary oligonucleotides encoding amino 

25 acids 2 to 53 of the amino-terminal common region of TPs 
a, P, and Y [SEQ ID NO: 2,4,6]. The oligonucleotides, 
nucleotides sense 1-87 and antisense 156 to 64 [SEQ ID 
NO: 1, 3 f 5], were annealed and radiolabeled by extension 
of their 3* ends. Hybridization was in 50% formamide, 3X 

30 SSPE [Sambrook et al, cited above] at 42 *C, and the 
highest stringency wash was in 0.1 X SSPE at 50 °C. 
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A single band was observed in all restriction 
digests. This is consistent with the previous inference 
from analysis of the cDNA sequences that thymopoietins a, 
p, and y are derived from a single gene by alternative 
5 mRNA splicing* 

EXAMPLE 5 - CHARACTERIZATION OF GENOMIC CLONES 

A. Mapping of 5 ' Ends of Thymopoietin mRNAs bv 
Primer Extension 

10 Primer T.6.G, 5 , -GCGCTTGCTCCTGGCCGGGG~3 • , antisense 

nucleotides 256 to 237 in Fig. 7 [SEQ ID NO: 25], 
radiolabelled using 33 P~ATP (Amersham) and T4 
polynucleotide kinase (GIBCO/BRL) , was annealed to 1 fig 
of total RNA for 20 min at 58 °C in 50 mM Tris-Cl pH 8.3, 

15 75 mM KC1, 3 mM MgCl 2 , 10 mM DTT, and 0.5 Hi each dNTP, 
and then extended with RNase W MMLV reverse 
transcriptase (Promega) at 42 °C for 30 min. After 
addition of loading dye and heating at 90 *C for 10 min, 
the reaction products were separated on a 6% 

20 acrylamide/urea gel, and then exposed to Kodak Biomax 
film. 

Two major transcription start sites were mapped by 
primer extension 313 and 300 bp upstream of the 
translation initiation codon (Fig. 12, SEQ ID NO: 18). 
25 This location for the 5 1 ends of the mRNAs is supported 
by the calculated length of TP a mRNA, as discussed 
below, and by the inability of a^probe from -91 to -276 

v. 

to detect TP mRNAs on Northern bl6ts. However, less 
prominent primer extension products smaller or larger 

30 than the major products were seen in some experiments, 
and 2 of 10 cDNAs that extend 5' of the translation 
initiation codon extend farther 5 1 than the major 
transcription start sites determined by primer extension 
(Fig. 12, SEQ ID NO: 18), Thus, there may be some 

35 heterogeneity in transcription start sites, as previously 



WO 95/17205 



PCTAJS94/14356 



27 



reported for other genes that contain GC-rich 5" flanking 
sequences and are expressed widely in many different 
tissues. As the sequence surrounding the 5' end of the 
gene is GC-rich (62% GC from -200 to +200) and has a 
5 potential for folding into stable secondary structures, 
features that favor stalling of reverse transcriptases, 
it is possible that additional transcription start sites 
exist that were undetected. 

Analysis of the sequence of the 5* flanking region 
10 revealed several potential binding sites for known 

transcription factors. No obvious TAT AAA— 1 ike sequence 
that could be an obvious binding site for the general 
transcription factor TFIID-TBP is found in the usual 
position - 30 bp 5' to the transcription start sites, an 
15 absence that is characteristic of some other genes 

expressed in many tissues; however, the closely related 
sequence TTTAAA is present - 100 bp upstream. Two CCAAT 
boxes, potential binding sites for members of the CTF/NF- 
1 family of transcription factors [Santoro e t al, Nature , 
20 334:218-224 (1988)], are present within direct repeat 
sequences - 30 bp 5' to and between the two major 
apparent transcription start sites determined by primer 
extension. Potential binding sequences for transcription 
factor Spl [Kadonaga et al, Trends Biochem. Sci., 11:20- 
25 23 (1986)] are found within a large palindromic sequence 
at -121 to -99, at -450, and between the apparent 
transcription start sites. A sequence similar to an 
interferon stimulated response element (ISRE) [Pellegrini 
et al. Trends Biochem. Sci. , 18:338-342 (1993)], is at 
-140 on the antisense strand. Two potential binding 
sites for the histone H4 gene transcription factor H4TF-1 
[Dailey et al, ftones and Develop. . 2:1700-1712 (1988)] 
are present at -450 and at -220. An "octamer" sequence 
that is a potential binding site for transcription 
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factors OTF-1, OTF-2, a transcription factor that plays a 
role in the cell cycle-dependent transcription of the 
histone H2B gene, and related proteins is found at -1395 
[Schaffner, Trends Genet. . 5_: 37-39 (1989)]. The presence 

5 in the 5' flanking region of a sequence on the antisense 
strand matching the short 3 '-end sequence tag of a 
randomly isolated cDNA [Okubo et al, Genbank D19953) ] 
could indicate another transcription unit overlapping the 
TP gene on the antisense strand. 

10 Analysis of sequences of exon borders reveals that 

the alternative splicing of TP mRNAs involves competition 
of 3' splice sites. Exon 1 encodes the 5' untranslated 
region and amino acids 1-92 [SEQ ID NO: 2, 4, 6], exon 2 
encodes amino acids 93-134 [SEQ ID NO: 2, 4, 6], and exon 

15 3 encodes amino acids 135-187 [SEQ ID NO: 2, 4, 6]. 

These exons are constitutively spliced to form the common 
sequences present in all three TPs a, p , and y (See Table 
II below) . Exon 4 encodes TP o-specific amino acids 188 
to 693 SEQ ID NO: 2, including a previously noted putative 

20 basic nuclear localization motif at amino acids 189-195, 
and the 3' untranslated region of TP a mRNA. Exon 5 
encodes amino acids 188 to 220 of TPs P and y [SEQ ID 
NO: 4 and 6]. Exon 6 encodes amino acids 221-329 of TP p, 
the ^-specific domain [SEQ ID NO:4]. Exon 7 encodes 

25 amino acids 330-359 of TP p (221-250 of TP y) , and exon 8 
encodes amino acids 360-453 of TP P (251-344 of TP y (SEQ 
ID NO:4)), including the hydrophobic putative membrane- 
spanning domain at 0 amino acids 410-433 that is thought 
to localize TPs p and y [SEQ ID NO: 4 and 6] to the 

30 nuclear membrane. 
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Table II 

splice Site Sequences of the Human Thymopoietin Gene* 

5 1 Splice Sites 

SEQ ID 



10 



20 







EXON | 


INTRON 


NO 


1 


(otfY) 


CGTCGGCAGG | 


gtaaggacgcggggccgggg 


26 


2 




CCTATTGTGG | 


gtaagttgataaaatttcaa 


27 


3 


(aj9y) 


AATGAAGAAG 


gtaaaattttaaatgatagt 


28 


5 


(0Y) 


GCACAATCAG 


gtactttagttttattacca 


29 


6 


(P) 


AAGAGCTGAA 


gtaaatgaatacaatttaga 


30 


7 


(0Y> 


CAGGAATTAG 


gtattcagatacatttaaac 


31 




Consensus CAG 


gtaagt 


32 



a g 
3 1 Splice Sites 

SEQ ID Amino Acid 





INTRON | 


EXON 




NO 


Interrupted 


30 


ttactggactttgtttacag | 


AAAGCCACAA 


2 


(<*0Y) 


33 


Arg 92 /Lys 93 


caagttctgccttaatccag | 


GAACAACCAG 


3 


(apy) 


34 


Gly 135 




tgcctcttttgcctctacag 


GAAAGAAGAA 


4 


(a) 


35 


aGly 188 


35 


ttctccaatgttatttccag 


ACTCTAAAAT 


5 


(0Y> 


36 


0YAsp 188 




atgtgttgatgcttgaatag 


AGCTATTCTC 


6 


(0> 


37 


Gln^/Ser 221 


40 


gtttgtctgtttcttattag 


i GTGGGAGAAA 


7 


(/?Y) 


38 


Y Gln 220 /Val 221 




cctcctttcactcccaacag 


| TGCTAGTTCC 


8 


<0Y> 


39 


/JSer^/YSer 250 




(y) n nyag I 


G 






40 


consensus 



* Consensus sequences from Green, Annu. Rev. Cell Biol^ ,, 



7:559-599 (1991) . 



50 
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All splice sites for TPs a, p, and y contain the 
canonical GT and AG dinucleotides at the intron borders 
(See Table II) . The splice sites match consensus splice 
site sequences to varying extents. The poly-pyrimidine 
5 tract of the exon 6 3' splice site is poorly conserved, 
containing 8 purines. This relatively poor 
polypyrimidine tract of the exon 6 3' splice site may 
weaken the binding of splicing factors such as U2AF that 
interact with this region [Green, cited above, (1991)], 
10 facilitating competition by the 3 ■ splice site of exon 5 
for splicing to exon 7, and thus production of both TP p 
and TP y mRNAs. 

B. T ^Qlati 0 * 1 and Sequencing of Thvmonoi etin a mRNft 
3 'UTR 

15 complete 3' untranslated regions were amplified by 

two rounds of 3' rapid amplification of cDNA ends (3» 
RACE) [Frohman et al, pt-qc. Natl. Acad. Sci. USA, 
£5:8998-9002 (1988)] using a kit from GIBCO/BRL, 
according to the manufacturer's instructions with TP ot- 

20 specific primers from near the 3» end of the previously 
determined TP a cDNA sequence [SEQ ID N0:1]. First 
strand cDNA was synthesized from 1 m of total RNA using 
the adapter primer 5'-GGCCACGCGTCGACTAGTAC(T) 17 -3 » [SEQ 
ID NO: 41]. Primers for the first round of PCR were the 

25 universal amplification primer (5'- 

CUACUACUACUAGGCCACGCGTCGACTAGTAC-3' (SEQ ID NO: 42) and 
TP a RACE primer 1 ( CAAAATGTTAAGCTTCTACCC , nucleotides 
2185 to 2205 of SEQ ID NO:l) . Primers for the second 
round of PCR were the universal amplification primer and 

30 TP a RACE primer 2 ( 5 • -TTAATTGAATTCGCCTGTGTAGAACTACTTGTC- 
3', nucleotides 2231 to 2251 SEQ ID NO: 43) plus an 
adapter sequence at the 5* end. PCR conditions were an 
initial denaturation at 94 'C for 5 min followed by 25 
cycles of 94'C for 45 sec, 60°C for 45 sec, and 72°C for 
35 2 min, and a final extension at 72°C for 15 min, in a 
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10 



perkin-Elmer 9600 thermocycler. PCR products were cloned 
into pCR-Script SK(+) (Stratagene) for sequencing. 

None of the previously isolated TP o cDNA clones 
extended to the 3' end of TP o mRNA. Therefore the 
complete 3- untranslated regions were amplified by 3' 
RACE. Two sizes of 3' RACE cDNAs, -0.3 and -1.2 kb, were 
obtained, and completely sequenced (Fig. 13, SEQ ID NO: 
19). The longer 3* RACE cDNA was generated from a mRMA 
that was cleaved and poly-adenylated after nucleotide 
3683-3685 (ambiguous because of adenines encoded in the 
gene). Assuming a poly-A tail of -250 nucleotides, this 
corresponds to a TP a mRNA of -3933 nucleotides, in 
excellent agreement with the estimate of -4.0 kb from 
Northern blots. A perfect match to the consensus poly-A 
15 signal, AAUAAA [SEQ ID NO: 46], is present -18 
nucleotides 5' to the poly-adenylation site. 

One mechanism for control of the formation of TP a 
mRNA is via control of its 3 '-end cleavage and 
polyadenylation before splicing occurs [McKeown, Ann^ 
20 ft*-" Biol. . 8:133-155 (1992)]. This eliminates all 

the downstream exons encoding TP p and y sequences and 
removes them as competitors for the splicing of the exon 
3 to the o-specific exon 4. The shorter 3« RACE cDNA 
was generated from a mRNA that was cleaved and 
25 polyadenylated after nucleotide 2896-2898, which would 
correspond to a total mRNA length, including poly-A 
sequence, of -3146 nucleotides. In Northern blots, a TP 
mRNA of this size is only faintly detected, consistent 
with the imperfect polyadenylation signal, AUUAAA [SEQ ID 
30 NO: 48]. This is the most common deviation from the 
consensus polyadenylation signal observed, and has 
previously been observed to be used less efficiently 
[Keller, Biochem. , 61:419-440 (1992)]. 

Interestingly, an identical 11 nucleotide sequence, 
35 GAACAGTGNTGT [SEQ ID NO: 44], is present at the two 
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alternative polyadenylation sites (Fig* 13 SEQ ID NO: 19) 
for TP a mRNA, precisely at the two alternative Spends, 
which are separated by more than 800 bp in the gene. 
Thus, this sequence may be a binding site for a factor 
5 that regulates TP a laRNA 3 , -end formation, perhaps in a 
tissue-specific manner* Previous analyses of 
requirements for mRNA 3* -end formation in several genes 
have suggested that similar sequences downstream of the 
AAUAAA [SEQ ID NO: 46] sequence may be important in 3 1 - 
10 end selection [Keller, cited above (1992)]. 

C. Fluorescence in Situ Hybridization 
Fluorescence in situ hybridization was performed at 
Bios Laboratories, Inc. ASHG-1 DNA was labeled with 
digoxigenin dUTP by nick translation and hybridized to 
15 human metaphase chromosome spreads in the presence of 

sheared human DNA to saturate repetitive sequences* The 
probe was detected with FITOlabeled anti-digoxigenin, 
and chromosomes were counterstained with propidium 
iodide. 

20 To confirm the identity of the labeled chromosome as 

chromosome 12, spreads were probed simultaneously with 
the TP probe and with D12Z1, a probe containing sequences 
located in the centromeric region of chromosome 12 
[Looijenga et al, C ytogenet. Cell Genet. , 53; 216-218 

25 (1990)]. The distance between the centromere and the TP 
hybridization signal was determined on 20 chromosomes. 
Measurements of the 20 hybridized chromosomes 12 showed 
that the TP gene is located 64% of the distance from the 
centremere to the telomere of 12q, corresponding to band 

30 12q22. 
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F.VAMPT.E 6 - E XPRESSION OF RECOMBI NANT HUMAN TP IN 
BACTERIA 

The open reading frames (ORFs) for recombinant human 
thymopoietin cDNAs a, p , and y [SEQ ID NOS: 1, 3, 5] have 
5 been expressed in E. coli using inducible T7 RNA 

polymerase-dependent pET expression vectors [Novagen; 
Studier et al, Meth. Enzvmol. , 185:60-89 (1990)] as 
follows. 

To construct an hTPa expression vector, the ORF was 
10 amplified by PCR from AhTP~T.32 and an overlapping Bam 
Hi/Hind III fragment from XhTP-T.153. Primers that 
introduced an Nhe I site at the 5* end and an Xho I site 
at the 3* end were used, allowing insertion into the 
vector pET-17b (Novagen) between the Nhe I and Xho I 
15 sites. This construct, called pEThTPa, pETTII, or 

pET17b-hTPa, encodes hTPa as a fusion protein with three 
additional amino acids, Met Ala Ser at the amino 
terminus, followed by the hTPa sequence [SEQ ID NO: 2] 
beginning with Met Pro Glu. 
20 To construct an hTP£ expression vector, the open 

reading frame was amplified by PCR from AT. 17, using 
primers that introduced an Nde I site at the 5» end and a 
BamHI site at the 3" end, allowing ligation into the 
vector pET-3a. The resulting expression plasmid, called 
25 pEThTP0, pETTIa or pET3ahTPB, encoded hTP/3 [SEQ ID NO: 4] 
and contained no additional amino acids. 

pETHTPy, pETTIb, or pET3ahTPy was constructed as 
described for pEThTP/? , except the open reading frame was 
amplified from AhTP-T.206. 
30 For expression, the plasmids were transformed into 

E. coli strain BL21(DE3) [Novagen] which contains the T7 
RNA polymerase gene integrated into the chromosome and 
under the control of the lacUV5 promoter. Induction of 
transcription from the lacUVS promoter by addition of 
35 isopropyl 0-D-thioglucoside [IPTG? Gibco-BRL] produces 
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10 



15 



the T7 RNA polymerase, which in turn transcribes the hTP 
genes which are under the control of a T7 RNA polymerase- 
dependent promoter- Cells were grown in M9 medium 
supplemented with 1% casamino acid [Difco] and 100 ng/ml> 
ampicillin or carbenicillin [Sigma] . When the cell 
density reached an optical density of 0.3 to 0.5 at 600 
nm (at approximately 4 hours) , the T7 RNA polymerase was 
induced by addition of IPTG to 1 mM, and the cells were 
grown for an additional 4 hours or overnight. 

To confirm that the bacteria had been transformed 
with the appropriate plasmids and that the correct 
proteins were being produced, lysates of E. coli strains 
expressing the recombinant TPs were compared to lysates 
of the human T cell line CEM [American Type Culture 
Collection, ATCC #CCL 119]. Mammalian cell extracts were 
prepared by lysing cells in 1% NP-40, 20 mM Tris-HCl P H 
7.5, 150 mM NaCl, 1 mM EDTA, 0.1 mM EGTA, 0.5 mM DTT, 
plus the following protease inhibitors (Boehringer 
Mannheim): 10 fig/ml aprotinin, 0.3 mM pepstatin, 0.1 mM 
Pefabloc, 1 H<3/ral E-64. After passage through a 27 gauge 
needle 10 times to reduce viscosity and centrifugation to 
remove insoluble material, sample buffer [U. Laemmli, 
Nature . £22:658-680 (1970)] was added. E. coli extracts 
were prepared by direct lysis in sample buffer. Proteins 
25 were separated by SDS-PAGE in 10% gels (Novex) buffered 
with tricine, under reducing conditions. Proteins were 
transferred to nitrocellulose (Novex) , and TPs were 
detected after incubation with an affinity-purified 
rabbit antiserum raised against a synthetic peptide 
consisting of amino acids 1 to 19 of the common amino 
terminal region of TPs a, p and y [2-20 of SEQ ID NO: 2, 
4, 6] and peroxidase-linlced goat anti-rabbit Ig (Pierce) 
using an enhanced chemiluminescence system (Amersham) . 



20 
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The molecular masses of the TPs a, p and y [SEQ ID 
NOS: 2, 4, 6] were determined by comparison to marker 
proteins in separate experiments. Recombinant hTPs a, 
and y [SEQ ID NOS: 2, 4, 6] produce 75 kDa, 51 kDa and 39 
5 kDa proteins that co-migrated with the major thymopoietin 
proteins expressed in the human T cell line CEM. See 
Example 7* 

EXAMPLE 7 - CHARACTERIZATION OF TP PROTEINS 

10 A* Western Blot An alysis 

Recombinant TP a, p, and y [SEQ ID NOS: 2, 4, 6] 
expressed in E. col i were compared with the TP proteins 
expressed in the human T cell line CEM by immunob lotting 
as described above . 

X5 CEM cells express three major intracellular proteins 

detected with an antiserum against TP amino acids 1-19 , 
with apparent molecular masses of 75 , 51 , and 39 kDa. The 
75 kDa, 51 kDa, and 39 kDa CEM proteins are the sizes 
predicted from the cDNA sequences for TPs a, fi f and y , 

20 respectively, and co-migrate with recombinant TPs a, p 9 
and y* 

B. Northern Blot Analysis 

Poly (A) + RNA from the human T cell line CEM (ATCC) 
was prepared by extraction with acid guanidinium 

25 thiocyanate-phenol-chloroform [P. Chomczynski et al, 
&t^1. Biocheitu , 162:156-159 (1987)] using RNAzol 
(Cinna/Biotecx) , followed by selection on oligo-dT 
columns as described [Sambrook et al, Molecular Cloning: 
A Laboratory Manual, 2nd Edit., Cold Spring Harbor 

30 Laboratories, Cold Spring Harbor, NY (1989). 

C. TP mRNAs in T Cell Lines 

Probes for detection of TP mRNAs were partially 
overlapping oligonucleotides that were radiolabeled by 
extension of 3* ends to generate the complete double 
35 stranded sequence. Oligonucleotide sequences used were 
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the sense and antisense (complementary) sequences as 
follows, 

<*/£/Y se nse: nucleotides 1 to 87 [208 to 294 of SEQ 
ID NO:l, 3, 5], antisense: 156 to 64 [363 to 271 of SEQ 
5 ID NO: 1, 3, 5] ; 

tt-specific sense: 1488 to 1587 [1695 to 1795 Of SEQ 
ID NO:l], antisense: 1587 to 1570 [1794 to 1777 of SEQ ID 
NO: 1] ; 

/^-specific sense: 849 to 898 [1089 to 1139 of SEQ ID 
10 NO:4], antisense: 929 to 879 [1169 to 1119 of SEQ ID NO: 
3]; 

£/y-specific sense: 1286 to 1330 [1527 to 1571 of 
SEQ ID N0:3], antisense: 1365 to 1316 [1605 to 1556 of 
SEQ ID NO: 3] . 

15 Three distinct major human TP mRNAs, estimated to be 

4.4 Kb, 4.1 )cb, and 4.0 kb, were detected in CEM cells* 
All three mRNAs were detected when blots were probed with 
an oligonucleotide containing sequences encoding amino 
acids 1 to 52 of the human TPs [2-53 of SEQ ID NO: 2, 4, 

20 6], sequences that are present in TPs as, fi 9 and y . As 
none of the cDNAs isolated contain complete 3 * 
untranslated regions, the lengths of TP a, p, and y mRNAs 
could not be determined simply from the lengths of the 
cDNAs. Only the -4-4 Kb mRNA was detected with the p- 

25 specific probe, only the -4.0 kb mRNA was detected with 
the ^-specific probe, and the -4.1 kb mRNA was detected 
with the /9/y-specif ic probe but not with a-specific or /?■- 
specific probes, suggesting that the 4.4 kb mRNA encodes 
TP£, the 4.0 kb mRNA encodes TPa, and the 4*1 kb mRNA 

30 encodes TPy. 

D. Expression of TP mRNAs in adult and fetal tissues 
Poly (A) + RNA from human tissues and blots of human 
tissue mRNAs were purchased from Clontech* Glyoxylated 
poly (A) + RNAs were separated on 1% agarose gels and 

35 blotted to nylon membranes (Gibco BRL) ♦ Hybridization 
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and washing conditions were as described in sambrook et 
al, cited above. Sizes of mRNAs were determined by 
comparison to RNA size markers (Gibco BRL) . 

TP mRNAs were detected in all tissues examined, with 
5 highest expression in adult thymus and in fetal liver. In 
some tissues, TP mRNAs of slightly different sizes than 
the thymus mRNAs were resolved when electrophoresis times 
were extended. Whether such differences result from 
different 5' or 3' untranslated regions or additional 

10 distinct patterns of alternative splicing of coding exons 
is not yet known. Expression of TPs a, P, and y [SEQ ID 
NOS: 2, 4, 6] in many tissues, with particularly high 
expression in thymus, has also been observed in rodents, 
and initial analysis of rat TP cDNAs suggests a high 

15 level of sequence conservation between rat and human TP 
a, consistent with important functions of TPs in thymus 
and other tissues. 

PT yaMPT.F. 8 - TiyPRRSSIQW OF RECQ MRTWAMT HUMAN TP IN 
20 MAMMALIAN CELLS 

hTPs a, p, and y [SEQ ID NOS: 2, 4, 6] were 
expressed in mammalian cells by PCR amplification of the 
open reading frames and insertion into the mammalian 
expression vector pCMV6, a derivative of pCMVl [S. 
25 Andersson et al, J„,, Biol, Che*t t , £64.: 8222-8229 (1989)3 
between the Kpn I and Sal I sites for TPa and the Kpn I 
and Not I sites for TP/9 and TPy The resulting vectors 
are transfected into human embryonal kidney 293 cells 
[American Type Culture Collection, Accession #CRL 1573] 
30 by conventional techniques using calcium phosphate 

precipitation. The transfected cells are cultured in 
DMEM medium at 37 °C until confluent. 

The proteins are then isolated from the cell culture 
by lysis and conventional purification techniques and 
35 authenticated by Western blotting and SDS/PAGE. 
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EXAMPLE 9 - PRODUCTION OF SITE -SPECIFIC ANTIBODIES TO THE 
HTP SEQUENCE - SYNTHESIS OF (HTP 1 1? ) c ~IiYSINE CORE 

The antibodies described below were found to be 
capable of recognizing the specific peptide sequence 
5 within a larger synthetic peptide fragment or natural HTP 
molecule . 

An octameric branched lysine lattice was synthesized 
as described [Posnett et al, J. Biol . Chem. P 263 : 1719- 
1725 (1988)] and the protected hTP 1-19 fragment was 

10 synthesized by growth from both the a- and e -amino 
groups* An Applied Biosystems model 430A peptide 
synthesizer was used employing standard protocols and 
software version 1.4. All amino acids were double- 
coupled and the end-NH 2 program was used to remove the 

15 terminal Boc-groups. The protected peptide-resin was 

treated with liquid hydrogen fluoride, in the presence of 
p-cresol, p-thiocresol , and dimethyl sulfide as 
scavengers, at 0°c for l hour with constant stirring. 
Excess HF was removed by vacuum and the residue treated 

20 with ether to remove scavenger products. The peptide was 
extracted (3 x 50 mL) with 50% acetic acid and the 
solvents evaporated in vacuo , and the product freeze* 
dried. 

The crude peptide was initially purified on an 
25 Amberlite IRA-68 ion-exchange column; further 

purification was accomplished by reversed-phase HPLC on a 
preparative C 1B column. The solvents used were: water 
containing 0.1% trif luoroacetic acid (TFA) (buffer A) and 
CH 3 CN-H 2 0 (4:1) containing 0.1% TFA (buffer B) . A linear 
30 gradient of 15-30% buffer B over 100 minutes was used. 
The appropriate fractions containing the peptide were 
pooled, the solvents evaporated in vacuo , and the product 
freeze-dried. The purified peptide gave satisfactory 
amino acid analysis. This peptide was used as an 
35 immunogen to raise antibodies as described in Example 10. 
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EXAMPLE 10 - GENERATION AND ROUTINE TESTING OF ANT I SERA 
AGAINST SPECIFIC PROTEIN SEQUENCES 
A. Generation of Antiserum 

In order to produce reagents for use in immunoassays 
5 for both research and clinical diagnostic purposes, 
animals, usually rabbits, are repeatedly exposed to a 
compound in order to initiate an immune response that 
results in the formation of specific antibodies against 
that substance. By selecting specific regions of the hTP 

10 protein, e.g., those peptides disclosed in Table I above, 
and synthesizing these regions as smaller peptides, 
antibodies can be generated that specifically recognize 
the selected peptide and f the larger hTP as well. 

To greatly increase the antigenicity of the selected 

15 hTP peptide and assure greater exposure of the sequence 
of interest a polylysine core compound is designed which 
employs the multiple reactive sites on lysine to create a 
network of lysine molecules with repeats of the small hTP 
peptide as the final layer. Thus the odds of antibodies 

20 being generated against the specified hTP peptide 
sequence are greatly enhanced. 

Antisera are produced by injecting emulsions 
comprised of the polylysine core compound and an adjuvant 
into laboratory animals, preferably rabbits or sheep 

25 (mice are preferred for monoclonal preparation) , to help 
stimulate the immune response. The injections are given 
in multiple sites and at regular time intervals in order 
to create repeated exposures from several routes* After 
sufficient exposure to stimulate an immune response, 

30 e.g., about 40 days, sera is collected from the rabbits 
and tested for the presence of antibodies against the 
injected peptide sequence. 
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B. Testing rif Antisft-rum Titers 
Enzyme-Linked Immunoassay (ELISA) : In order to 
determine the concentration of specific antibodies 
present in the sera against the peptide of interest, 
5 serial dilutions of the test sera are added to wells of a 
microtiter plate that has been coated with the peptide 
used to generate the antiserum. After allowing time for 
the antibodies to bind to the coated peptide, the unbound 
sera is washed from the plate. A solution containing 
10 enzyme-linked antibodies that recognize immunoglobulins 

of the species in which the antisera was generated (e.g., 
anti-rabbit IgG antibodies) is added to the wells. These 
"anti-rabbit" antibodies bind to the rabbit antibodies 
that are bound to the peptide coated plate; thus, enzyme 
15 molecules (horseradish peroxidase) are effectively placed 
at each site where an antibody initially bound to the 
peptide coated plate. The unbound "anti-rabbit" 
antibodies are then washed from the plate. A substrate, 
which when converted by the enzyme to a different 
2 0 molecular form results in a color reaction, is added to 
the wells. 

The intensity of the color change is quantitated and 
used to determine the relative concentration of 
antibodies that bound to the peptide coated peptide. For 

25 purposes of comparison, the amount of antibody present 
(titer) is expressed as the concentration of antiserum 
required to produce a final color reaction with optical 
density of 1.0. This intensity generally represents a 
50% maximal response. Antisera showing sufficient titer 

30 are further characterized to determine both their full 
specificity and their utility in the various immuno- 
applications. 
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C. Results 

Rabbits immunized with multiple antigenic peptides 
corresponding to amino acid sequences derived from the 
cDNAs of the invention yielded antiserums with the 
following titers (titer yielding 1.0 OD unit by ELISA) : 



10 



15 



Peptides 


Titers 


SEO ID 


NO 








1-19 


8 x 10 6 


(2-20) 


2, 


4, 


6 


afiy 


28-39 


4 x 10 s 


(29-40) 


2, 


4, 


6 


<*0Y 


29-50 


1.2 x 10 7 


(30-51) 


2, 


4, 


6 


apy 


40-52 


1,6 x 10 7 


(41-53) 


2, 


4, 


6 


a/3y 


56-71 


1.5 x 10 6 


(57-72) 


2, 


4, 


6 


apy 


92-108 


8 x 10 6 


(93-109) 


2, 


4, 


6 


apy 


168-187 


2 x 10 5 


(169-188) 


2, 


4, 


6 


a 


233-253 


2 X 10 6 


(234-254) 


2 






a 


342-362 


8 X 10 6 


(343-363) 


2 






a 


425-443 


2.5 X 10 5 


(426-444) 


2 






a 


518-538 


3 X 10* 


(519-539) 


2 






a 


604-622 


1.5 X 10 6 


(605-623) 


2 






a 


188-197 


2.5 X 10 5 


(189-198) 


2 








196-215 


1 X 10 6 


(197-216) 


4, 


6 






247-265 


6 X 10 6 


(248-266) 


4 








312-329 


3 X 10 6 


(313-330) 


4 






py2 


332-348 


3 X 10 6 


(333-349) 
(224-240) 


4 
6 






py2 


397-412 


3 X 10 6 


(398-413) 
(289-304) 


4 
6 







TCYAMPT.E 11 - PREPARATION OF MONOCLONAL ANTIBODIES 
SPECIFIC ™* THYMO POIETIN 

30 A. t;TTiTitunization 

Synthetic peptide sequences (derived from the 
predicted protein sequences of each of three thymopoietin 
cDNAs) of approximately 20 amino acid residues were built 
on a branched core of seven lysine residues according to 

35 the method of Tam [see, e.g., Posnett et al, cited 
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above] . These structures are referred to as multiple 
antigenic peptides (MAP). In particular, mice were 
immunized with the HTPce/Jy sequence specified by residues 
29-50 (GEQRKDVYVQLYLQHLTARNRP) 8 K 7 G [30-51 of SEQ ID NO: 

5 2, 4, 6]. 

Balb/c mice, 8-12 weeks of age, were injected with 
50 m of MAP suspended in 200 Ml of adjuvant which was 
divided between the subcutaneous and peritoneal routes. 
The adjuvant for the first injection was either Ribi™ 

10 (Ribi IramunoChem, Hamilton, MT) or complete Freund's 

adjuvant. For subsequent injections, Ribi™ or incomplete 
Freund's adjuvant was used. A minimum of four injections 
(but more often 6-10) were given at no less than two week 
intervals. Sera were collected from animals 5 days 

15 following a booster injection in order to monitor 

antibody response. The reactivity of test sera with the 
specific MAP immunogen was measured by ELISA. Sera with 
high titers to the specific MAP were tested by western 
blot for binding to the native TP present in lysates of 

20 the T cell line CEM [ATCC; CCL 119]. Only mice which had 
serum showing high titers to the specific MAP and 
detectable binding to native TP were considered for 
fusion. 

B. Fusion 

25 splenocytes from immunoresponsive mice, in 

particular, a mouse immunized with HTP zg _ 50 MAP, were mixed 
with P3X63Ag8Ul (HGPRT" myeloma) cells [obtained from Dr. 
Matthew D. Scharff , Einstein University, Bronx, NY] at a 
ratio of 1:1. Cell fusion was accomplished by treating 

30 the pelleted cells with 40% polyethylene glycol 4000 

essentially as described in G. Kohler and C. Milstein, 
Nature . 256s 495 (1975). Hybridomas were grown in HAT 
selection medium as 1000 independent cultures and 
supernatants from the cultures were screened for TP- 

35 specific MAb production about 2 weeks after fusion. 
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C. Hybridoma Selection 

Selection for hybridomas producing TP-specific 
monoclonal antibodies was achieved by testing culture 
supernatants in ELISA systems in which the antigen on the 
5 plate was either bovine serum albumin (BSA) or the 

immunizing peptide. Supernatants negative for BSA and 
positive for the immunizing peptide were tested on 
additional synthetic peptides or enriched preparations of 
native TP and the hybridomas producing supernatants 

10 positive for only HTP^.^ containing synthetic peptides 
and the TP-enriched native materials were chosen for 
subcloning. Hybridoma clones arising from a single cell 
were isolated by two successive rounds of limit dilution 
plating. For the HTPajSy 29 . 50 lysine core immunogen, three 

15 independent hybridomas (885-1. 7B8, 885-1. 6E10 & 885- 
1.1C6) were identified and cloned. 
D. MAb characterization 

Anti-TP monoclonal antibodies, purified from murine 
ascites fluid, were shown to be specific for native Tp by 

20 the immunostaining profile observed on western blots of 
cell lysates prepared from the early T cell line CEM. 
Three proteins of apparent molecular sizes of 75 kDa, 51 
kDa and 39 kDa (the sizes predicted by the TP cDNA 
sequences and verified by expression of the TP cDNA's in 

25 E. coli ) were detected by the anti-HTP^.^ monoclonal 
antibodies, preincubation of the antibodies with the 
synthetic HTP 29 . 50 peptide but not with an irrelevant 
synthetic peptide resulted in the loss of immunostaining 
of the protein bands. This suggests that the protein 

30 bands recognized by the monoclonal antibodies are TP 
proteins. 



35 
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E. Other TP-specific MAbs 

Other monoclonal antibodies specific for one or more 
of the TP proteins, were obtained by immunization with 
MAP immunogens. These include those reported in Table 
5 III below. 

TABLE III 

MAP MAb TP Proteins 

HTPaBY 1-19 850-1. 10A8 a&y 
850-1. 10F8 oBy 
10 HTPB 312-329 937-1.6611 6 

937-1. 2B11 & 
HTPa 233-253 923-2. 9F5 a 

Numerous modifications and variations of the present 
invention are included in the above- identified 
15 specification and are expected to be obvious to one of 

skill in the art. Such modifications and alterations to 
the compositions and processes of the present invention 
are believed to be encompassed in the scope of the claims 
appended hereto. 

20 
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WHAT IS CLAIMED IS: 

1. A polynucleotide sequence encoding a 
thymopoietin protein, said sequence isolated from the 
cellular material with which it is naturally associated, 
which is selected from the group consisting of 



sequences (a) through (c) , 

(e) fragments of (a) through (d) , wherein said 
fragments are at least 15 nucleotides in length, 

(f ) sequences capable of hybridizing to (a) 
through (e) under stringent hybridization conditions, 

(g) sequences containing allelic variations of 

(a) through (e) , and 

(h) sequences containing modifications of (a) 

through (g) . 

2. A vector comprising a polynucleotide sequence 
according to claim 1* 

3. A host cell transformed by a vector according 
to claim 2, 

4. The host cell according to claim 3 wherein said 
polynucleotide is operably linked to a heterologous 
expression control sequence capable of directing the 
expression of the protein encoded by said sequence in a 
selected host cell. 

5. The host cell according to claim 3 selected 
from the group consisting of bacterial, fungal, insect, 
and mammalian cells* 



(a) 
(*>) 
(c) 
(d) 



SEQ ID NO: 1, 
SEQ ID NO: 3, 
SEQ ID NO: 5, 



a sequence complementary to any of 
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6. The host cell according to claim 5 wherein said 
cell is 3Ls__£oli. 

7. A method for producing recombinant human 
thymopoietin comprising incubating a transformed host 
cell comprising the polynucleotide sequence of claim 1 
encoding human thymopoietin under conditions that allow 
expression of the human thymopoietin and recovering the 
thymopoietin therefrom. 

8 . A method of producing recombinant human 
thymopoietin comprising: 

(a) providing a host cell and an expression vector 
comprising the polynucleotide sequence of claim 1 
encoding human thymopoietin operably linked to an 
expression control sequence directing the expression of 
the human thymopoietin; 

(b) incubating the host cell under conditions which 
allow transfection of the host cell by the vector; and 

(c) recovering said recombinant human 
thymopoietin. 

9. The method according to claim 8 wherein said 
conditions permit the secretion of the human 
thymopoietin. 

10. An hTP protein selected from the group 

consisting of: 

(a) Type a SEQ ID NO: 2, 

(b) Type jS SEQ ID NO: 4, 

(c) Type y SEQ ID NO: 6, 

(d) a fragment of (a) through (c) , wherein 
said fragment is at least 4 amino acids in length, and 
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(e) a modified protein sequence of (a) through 

(d), 

said protein having immunomodulatory activity, and 
isolated from the cellular material with which it is 
naturally associated* 

11. The protein according to claim 10 , wherein said 
fragment is selected from the group consisting of amino 
acids: 

about 1 to about 20, about 2 to about 52, about 29 
to about 40, about 30 to about 51, about 41 to about 53, 
about 57 to about 72, about 93 to about 109, about 169 to 
about 188 of SEQ ID NO: 2, 4 and 6; 

about 189 to about 203, about 234 to about 254, 
about 343 to about 363, about 426 to about 444, about 519 
to about 539, about 605 to about 62 3, and about 189 to 
about 198 of SEQ ID NO: 2; 

about 197 to about 216 of SEQ ID NO: 4 and 6? 

about 248 to about 266, about 313 to about 330 of 

SEQ ID NO: 4; and 

about 333 to about 349 and about 398 to about 413 of 

SEQ ID no: 4 and 6. 

12. A pharmaceutical composition useful in treating 
immune and nervous system conditions comprising a protein 
according to claim 10 and a pharmaceutical ly acceptable 
carrier. 

13- A diagnostic reagent comprising a 
polynucleotide sequence of claim 1 and a detectable 
label. 

14. A diagnostic reagent comprising a protein of 
claim 10. 
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15. A method of modulating an immune and/ or nervous 
system condition in a patient having an imbalance of same 
comprising administering to a patient in need thereof a 
protein according to claim 10. 

16. An antibody directed against an epitope of a 
protein according to claim 10. 

17. The antibody according to claim 16, wherein 
said antibody is selected from the group consisting of 
monoclonal, polyclonal, and recombinant antibodies. 

18 . A diagnostic reagent comprising an antibody 
directed against an epitope of a protein selected from 
the group consisting of: 

(a) Type a SEQ ID NO: 2, 

(b) Type & SEQ ID NO: 4, 

(c) Type y SE Q ID N0: 6 ' 

(d) a fragment of (a) through (c) , wherein 
said fragment is at least 4 amino acids in length, and 

(e) a modified protein sequence of (a) through 

(d) . 
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FIGURE 1A 

DNA and Amino Acid Sequences of TP a 

GTTCGTAGTT CGGCTCTGGG GTCTTTTGTG TCCGGGTCTG GCTTGGCTTT GTGTCCGCGA 
GTTTTTGTTC CGCTCCGCAG CGCTCTTCCC GGGCAGGAGC CGTGAGGCTC GGAGGCGGCA 
GCGCGGTCCC CGGCCAGGAG CAAGCGCGCC GGCGTGAGCG GCGGCGGCAA AGGCTGTGGG 

-1 +1 

GAGGGGGCTT CGCAGATCCC CGAG ATG CCG GAG TTC CTG GAA GAC CCC TCG 

Met: Pro Glu Phe Leu Glu Asp Pro Ser 
-1 +1 5 

GTC CTG ACA AAA GAC AAG TTG AAG AGT GAG TTG GTC GCC AAC AAT GTG 
Val Leu Thr Lys Asp Lys Leu Lys Ser Glu Leu Val Ala Asn Asn Val 
10 15 2D 

ACG CTG CCG GCC GGG GAG CAG CGC AAA GAC GTG TAC GTC CAG CTC TAC 
Thr Leu Pro Ala Gly Glu Gin Arg Lys Asp Val Tyr Val Gin Leu Tyr 
25 30 35 40 

CTG CAG CAC CTC ACG GCT CGC AAC CGG CCG CCG CTC CCC GCC GGC ACC 
Leu Gin His Leu Thr Ala Arg Asn Arg Pro Pro Leu Pro Ala Gly Thr 

45 50 55 

AAC AGC AAG GGG CCC CCG GAC TTC TCC AGT GAC GAA GAG CGC GAG CCC 
Asn Ser Lys Gly Pro Pro Asp Phe Ser Ser Asp Glu Glu Arg Glu Pro 
60 65 70 

ACC CCG GTC CTC GGC TCT GGG GCC GCC GCC GCG GGC CGG AGC CGA GCA 
Thr Pro Val Leu Gly Ser Gly Ala Ala Ala Ala Gly Arg Ser Arg Ala 
75 80 85 

GCC GTC GGC AGG AAA GCC ACA AAA AAA ACT GAT AAA CCC AGA CAA GAA 
Ala Val Gly Arg Lys Ala Thr Lys Lys Thr Asp Lys Pro Arg Gin Glu 
90 95 100 

GAT AAA GAT GAT CTA GAT GTA ACA GAG CTC ACT AAT GAA GAT CTT TTG 
Asp Lys Asp Asp Leu Asp Val Thr Glu Leu Thr Asn Glu Asp Leu Leu 
105 110 115 120 

GAT CAG CTT GTG AAA TAC GGA GTG AAT CCT GGT CCT ATT GTG GGA ACA 
Asp Gin Leu Val Lys Tyr Gly Val Asn Pro Gly Pro lie Val Gly Thr 

125 130 135 

ACC AGG AAG CTA TAT GAG AAA AAG CTT TTG AAA CTG AGG GAA CAA GGA 
Thr Arg Lys Leu Tyr Glu Lys Lys Leu Leu Lys Leu Arg Glu Gin Gly 
140 145 150 
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FIGURE IB 

ACA GAA TCA AGA TCT TCT ACT CCT CTG CCA ACA ATT TCT TCT TCA GCA 
Thr Glu Ser Arg Ser Ser Thr Pro Leu Pro Thr He Ser Ser Ser Ala 
155 160 165 

GAA AAT ACA AGG CAG AAT GGA AGT AAT GAT TCT GAC AGA TAC AGT GAC 
Glu Asn Thr Arg Gin Asn Gly Ser Asn Asp Ser Asp Arg Tyr Ser Asp 
170 175 180 

AAT GAA GAA GGA AAG AAG AAA GAA CAC AAG AAA GTG AAG TCC ACT AGG 
Asn Glu Glu Gly Lys Lys Lys Glu His Lys Lys Val Lys Ser Thr Arg 
185 * 190 195 200 

GAT ATT GTT CCT TTT TCT GAA CTT GGA ACT ACT CCC TCT GGT GGT GGA 
Asp He Val Pro Phe Ser Glu Leu Gly Thr Thr Pro Ser Gly Gly Gly 

205 210 215 

TTT TTT CAG GGT ATT TCT TTT CCT GAA ATC TCC ACC CGT CCT CCT TTG 
Phe Phe Gin Gly He Ser Phe Pro Glu He Ser Thr Arg Pro Pro Leu 
220 225 230 

GGC AGT ACC GAA CTA CAG GCA GCT AAG AAA GTA CAT ACT TCT AAG GGA 
Gly Ser Thr Glu Leu Gin Ala Ala Lys Lys Val His Thr Ser Lys Gly 
235 240 245 

GAC CTA CCT AGG GAG CCT CTT GTT GCC ACA AAC TTG CCT GGC AGG GGA 
Asp Leu Pro Arg Glu Pro Leu Val Ala Thr Asn Leu Pro Gly Arg Gly 
250 255 260 

CAG TTG CAG AAG TTA GCC TCT GAA AGG AAT TTG TTT ATT TCA TGC AAG 
Gin Leu Gin Lys Leu Ala ser Glu Arg Asn Leu Phe He Ser Cys Lys 
265 270 275 280 

TCT AGC CAT GAT AGG TGT TTA GAG AAA AGT TCT TCG TCA TCT TCT CAG 
Ser Ser His Asp Arg Cys Leu Glu Lys Ser Ser Ser Ser Ser Ser Gin 

285 290 295 

CCT GAA CAC AGT GCC ATG TTG GTC TCT ACT GCA GCT TCT CCT TCA CTG 
Pro Glu His Ser Ala Met Leu Val Ser Thr Ala Ala Ser Pro Ser Leu 
300 305 310 

ATT AAA GAA ACC ACC ACT GGT TAC TAT AAA GAC ATA GTA GAA AAT ATT 
lie Lys Glu Thr Thr Thr Gly Tyr Tyr Lys Asp He Val Glu Asn He 
315 320 325 

TGC GGT AGA GAG AAA AGT GGA ATT CAA CCA TTA TGT CCT GAG AGG TCC 
Cys Gly Arg Glu Lys ser Gly He Gin Pro Leu Cys Pro Glu Arg Ser 
330 335 340 

CAT ATT TCA GAT CAA TCG CCT CTC TCC AGT AAA AGG AAA GCA. CTA GAA 
His lie Ser Asp Gin Ser Pro Leu Ser Ser Lys Arg Lys Ala Leu Glu 
345 350 355 360 
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FIGURE 1C 

GAG TCT GAG AGC TCA CAA CTA ATT TCT CCG CCA CTT GCC CAG GCA ATC 
Glu Ser Glu Ser Ser Gin Leu lie Ser Pro Pro Leu Ala Gin Ala lie 

365 370 375 

AGA GAT TAT GTC AAT TCT CTG TTG GTC CAG GGT GGG GTA GGT AGT TTG 
Arg Asp Tyr Val Asn Ser Leu Leu Val Gin Gly Gly Val Gly Ser Leu 
380 385 390 

CCT GGA ACT TCT AAC TCT ATG CCC CCA CTG GAT GTA GAA AAC ATA CAG 
Pro Gly Thr Ser Asn Ser Met Pro Pro Leu Asp Val Glu Asn lie Gin 
395 400 405 

AAG AGA ATT GAT CAG TCT AAG TTT CAA GAA ACT GAA TTC CTG TCT CCT 
Lys Arg He Asp Gin Ser Lys Phe Gin Glu Thr Glu Phe Leu Ser Pro 
410 415 420 

CCA AGA AAA GTC CCT AGA CTG AGT GAG AAG TCA GTG GAG GAA AGG GAT 
Pro Arg Lys Val Pro Arg Leu Ser Glu Lys Ser Val Glu Glu Arg Asp 
425 430 435 440 

TCA GGT TCC TTT GTG GCA TTT CAG AAC ATA CCT GGA TCC GAA CTG ATG 
Ser Gly Ser Phe Val Ala Phe Gin Asn He Pro Gly Ser Glu Leu Met 

445 450 455 

TCT TCT TTT GCC AAA ACT GTT GTC TCT CAT TCA CTC ACT ACC TTA GGT 
Ser Ser Phe Ala Lys Thr Val Val Ser His Ser Leu Thr Thr Leu Gly 
460 465 470 

CTA GAA GTG GCT AAG CAA TCA CAG CAT GAT AAA ATA GAT GCC TCA GAA 
Leu Glu Val Ala Lys Gin Ser Gin His Asp Lys He Asp Ala Ser Glu 
475 480 485 

CTA TCT TTT CCC TTC CAT GAA TCT ATT TTA AAA GTA ATT GAA GAA GAA 
Leu Ser Phe Pro Phe His Glu Ser He Leu Lys Val He Glu Glu Glu 
490 495 500 

TGG CAG CAA GTT GAC AGG CAG CTG CCT TCA CTG GCA TGC AAA TAT CCA 
Trp Gin Gin Val Asp Arg Gin Leu Pro Ser Leu Ala cys Lys Tyr Pro 
505 510 515 520 

GTT TCT TCC AGG GAG GCA ACA CAG ATA TTA TCA GTT CCA AAA GTA GAT 
Val Ser Ser Arg Glu Ala Thr Gin He Leu Ser Val Pro Lys Val Asp 

525 530 535 

GAT GAA ATC CTA GGG TTT ATT TCT GAA GCC ACT CCA CTA GGA GGT ATT 
Asp Glu He Leu Gly Phe He Ser Glu Ala Thr Pro Leu Gly Gly He 
540 545 550 

CAA GCA GCC TCC ACT GAG TCT TGC AAT CAG CAG TTG GAC TTA GCA CTC 
Gin Ala Ala Ser Thr Glu Ser Cys Asn Gin Gin Leu Asp Leu Ala Leu 
555 560 565 
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FIGURE ID 



TGT 


AGA 

2k y*rf 

jri. JL y 

570 


GCA 


TAT 


GAA 
GlU 


GCT 
Ala 


GCA 
Ala 
575 


GCA 
Ala 


TCA 
Ser 


GCA 
Ala 


TTG 
Leu 


CAG 
Gin 
580 


ATT 
He 


GCA 
Ala 


ACT 

Thr 


CAC 
His 


Thr 

585 


Ala 


rprprn 
14.1 

Phe 


Val 


Ala 


Lys 

590 


GCT 
Ala 


ATG 
Met 


CAG 
Gin 


GCA 
Ala 


GAC 
Asp 
595 


ATT 
He 


AGT 
Ser 


G 

CAA 
Gin 
Glu 


GCT 
Ala 


GCA 
Ala 
600 


Gin 


TV mm 
ill 1 

lie 


Leu 


Ser 


± 

ser 
605 


Asp 


X 

Pro 


ACT 1 

Ser 


Arg 


ACC 
Thr 
610 


CAC 

His 


CAA 
Gin 


Ala 


Vw JL X 

Leu 


GGG 
Gly 
615 


ATT 
He 


CTG 
Leu 


AGC 
Ser 


AAA 
Lvs 


ACA 
Thr 


TAT 
Tyr 


GAT 
Asp 


GCA 

Ala 


GCC 
Ala 


TCA 
Ser 


TAT 
Tvr 


ATT 
He 


TGT 
Cys 


GAA 
Glu 


GCT 
Ala 


GCA 
Ala 


TTT 
Phe 








620 










625 








630 






GAT 
Asp 


GAA 
Glu 


GTG 
Val 
635 


AAG 
Lys 


ATG 
Met 


GCT 
Ala 


GCC 

Ala 


CAT 
His 
640 


ACC 
Thr 


ATG 
Met 


GGA AAT 
Gly Asn 


GCC 
Ala 
645 


ACT 
Thr 


GTA 
Val 


GGT 
Gly 


CGT 
Arg 


CGA 
Arg 
650 


TAG 
Tyr 


CTC 
Leu 


TGG 
Trp 


CTG 
Leu 


AAG 
Lys 
655 


GAT 
Asp 


TGC 
Cys 


AAA 
Lys 


ATT 
He 


AAT 
Asn 
660 


TTA 
Leu 


GCT 
Ala 


TCT 
Ser 


AAG 


AAT 
Asn 
665 


AAG 
Lys 


CTG 
Leu 


GCT 
Ala 


TCC 
Ser 


ACT 
Thr 
670 


CCC 

Pro 


TTT 
Phe 


AAA 
Lys 


GGT GGA ACA 
Gly Gly Thr 

675 


TTA 
Leu 


TTT 
Phe 


GGA 
Gly 


GGA 
Gly 
680 


GAA 
Glu 


GTA 
Val 


TGC 
Cys 


AAA 
Lys 


GTA 
Val 


ATT 
He 


AAA 
Lys 


AAG 
Lys 


CGT 
Arg 


GGA AAT 
Gly Asn 


AAA 
Lys 


CAC 
His 


TAGTAAAATT 



685 690 
AAGGACAAAA AGACATCTAT CTTATCTTTC AGGTACTTTA TGCCAACATT TTCTTTTCTG 
TTAAGGTTGT TTTAGTTTCC AGATAGGGCT AATTACAAAA TGTTAAGCTT CTACCCATCA 
AATTACAGTA TAAAAGTAAT TGCCTGTGTA GAACTACTTG TCTTTTCTAA AGATTTGCGT 
AGATAGGAAG CCTG 
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FIGURE 2A 
DNA and Amino Acid Sequence of TP P 

GGTTGGTGCG AGCTTCCAGC TTGGCCGCAG TTGGTTCGTA GTTCGGCTCT GGGGTCTTTT 
GTGTCCGGGT CTGGCTTGGC TTTGTGTCCG CGAGTTTTTG TTCCGCTCCG CAGCGCTCTT 
CCCGGGCAGG AGCCGTGAGG CTCGGAGGCG GCAGCGCGGT CCCCGGCCAG GAGCAAGCGC 
GCCGGCGTGA GCGGCGGCGG CAAAGGCTGT GGGGAGGGGG CTTCGCAGAT CCCCGAG 
-1 +1 

ATG CCG GAG TTC CTG GAA GAC CCC TCG GTC CTG ACA AAA GAC AAG TTG 
Met Pro Glu Phe Leu Glu Asp Pro Ser Val Leu Thr Lys Asp Lys Leu 
~1 +1 5 10 15 

AAG AGT GAG TTG GTC GCC AAC AAT GTG ACG CTG CCG GCC GGG GAG CAG 
Lys Ser Glu Leu Val Ala Asn Asn Val Thr Leu Pro Ala Gly Glu Gin 

20 25 30 

CGC AAA GAC GTG TAG GTC CAG CTC TAC CTG CAG CAC CTC ACG GCT CGC 
Arg Lys Asp Val Tyr Val Gin Leu Tyr Leu Gin His Leu Thr Ala Arg 
35 40 45 

AAC CGG CCG CCG CTC CCC GCC GGC ACC AAC AGC AAG GGG CCC CCG GAC 
Asn Arg Pro Pro Leu Pro Ala Gly Thr Asn Ser Lys Gly Pro Pro Asp 
50 55 60 

TTC TCC AGT GAC GAA GAG CGC GAG CCC ACC CCG GTC CTC GGC TCT GGG 
Phe Ser Ser Asp Glu Glu Arg Glu Pro Thr Pro Val Leu Gly Ser Gly 
65 70 75 

GCC GCC GCC GCG GGC CGG AGC CGA GCA GCC GTC GGC AGG AAA GCC ACA 
Ala Ala Ala Ala Gly Arg Ser Arg Ala Ala Val Gly Arg Lys Ala Thr 
80 85 90 95 

AAA AAA ACT GAT AAA CCC AGA CAA GAA GAT AAA GAT GAT CTA GAT GTA 
Lys Lys Thr Asp Lys Pro Arg Gin Glu Asp Lys Asp Asp Leu Asp Val 
100 105 110 

ACA GAG CTC ACT AAT GAA GAT CTT TTG GAT CAG CTT GTG AAA TAC GGA 
Thr Glu Leu Thr Asn Glu Asp Leu Leu Asp Gin Leu Val Lys Tyr Gly 
115 120 125 

GTG AAT CCT GGT CCT ATT GTG GGA ACA ACC AGG AAG CTA TAT GAG AAA 
Val Asn Pro Gly Pro lie Val Gly Thr Thr Arg Lys Leu Tyr Glu Lys 
130 135 140 
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FIGURE 2B 

AAG CTT TTG AAA CTG AGG GAA CAA GGA ACA GAA TCA AGA TCT TCT ACT 
Lys Leu Leu Lys Leu Arg Glu Gin Gly Thr Glu Ser Arg Ser Ser Thr 
145 150 155 

CCT CTG CCA ACA ATT TCT TCT TCA GCA GAA AAT ACA AGG CAG AAT GGA 
Pro Leu Pro Thr lie Ser Ser Ser Ala Glu Asn Thr Arg Gin Asn Gly 
160 165 170 175 

AGT AAT GAT TCT GAC AGA TAG ACT GAC AAT GAA GAA GAC TCT AAA ATA 
Ser Asn Asp Ser Asp Arg Tyr Ser Asp Asn Glu Glu Asp Ser Lys He 

180 185 * 190 

GAG CTC AAG CTT GAG AAG AGA GAA CCA CTA AAG GGC AGA GCA AAG ACT 
Glu Leu Lys Leu Glu Lys Arg Glu Pro Leu Lys Gly Arg Ala Lys Thr 
195 200 205 

CCA GTA ACA CTC AAG CAA AGA AGA GTT GAG CAC AAT CAG AGC TAT TCT 
Pro Val Thr Leu Lys Gin Arg Arg Val Glu His Asn Gin Ser Tyr Ser 
210 215 220 

CAA GCT GGA ATA ACT GAG ACT GAA TGG ACA AGT GGA TCT TCA AAA GGC 
Gin Ala Gly lie Thr Glu Thr Glu Trp Thr Ser Gly Ser Ser Lys Gly 

225 230 235 

GGA CCT CTG CAG GCA TTA ACT AGG GAA TCT ACA AGA GGG TCA AGA AGA 
Gly Pro Leu Gin Ala Leu Thr Arg Glu Ser Thr Arg Gly ser Arg Arg 

240 245 250 255 

ACT CCA AGG AAA AGG GTG GAA ACT TCA GAA CAT TTT CGT ATA GAT GGT 
Thr Pro Arg Lys Arg Val Glu Thr ser Glu His Phe Arg lie Asp Gly 

260 265 270 

CCA GTA ATT TCA GAG AGT ACT CCC ATA GCT GAA ACT ATA ATG GCT TCA 
Pro val lie Ser Glu ser Thr Pro He Ala Glu Thr He Met Ala Ser 

275 280 285 

AGC AAC GAA TCC TTA GTT GTC AAT AGG GTG ACT GGA AAT TTC AAG CAT 
Ser Asn Glu Ser Leu Val val Asn Arg val Thr Gly Asn Phe Lys His 

290 295 300 

GCA TCT CCT ATT CTG CCA ATC ACT GAA TTC TCA GAC ATA CCC AGA AGA 
Ala Ser Pro He Leu Pro He Thr Glu Phe Ser Asp He Pro Arg Arg 

305 310 315 

GCA CCA AAG AAA CCA TTG ACA AGA GCT GAA GTG GGA GAA AAA ACA GAG 
Ala Pro Lys Lys Pro Leu Thr Arg Ala Glu Val Gly Glu Lys Thr Glu 

320 325 330 335 
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FIGURE 2C 

GAA AGA AGA GTA GAA AGG GAT ATT CTT AAG GAA ATG TTC CCC TAT GAA 

Glu Arg Arg Val Glu Arg Asp lie Leu Lys Glu Met Phe Pro Tyr Glu 

340 345 350 

GCA TCT ACA CCA ACA GGA ATT AGT GCT AGT TGC CGC AGA CCA ATC AAA 

Ala Ser Thr Pro Thr Gly He Ser Ala Ser Cys Arg Arg Pro He Lys 
355 360 365 

GGG GCT GCA GGC CGG CCA TTA GAA CTC AGT GAT TTC AGG ATG GAG GAG 

Gly Ala Ala Gly Arg Pro Leu Glu Leu Ser Asp Phe Arg Met Glu Glu 
370 375 380 

TCT TTT TCA TCT AAA TAT GTT CCT AAG TAT GTT CCC TTG GCA GAT GTC 

Ser Phe Ser Ser Lys Tyr Val Pro Lys Tyr Val Pro Leu Ala Asp Val 
385 390 395 

AAG TCA GAA AAG ACA AAA AAG GGA CGC TCC ATT CCC GTA TGG ATA AAA 

Lys Ser Glu Lys Thr Lys Lys Gly Arg Ser lie Pro Val Tro He Lvs 

400 405 410 415 

ATT TTG CTG TTT GTT GTT GTG GCA GTT TTT TTG TTT TTG GTC TAT CAA 

He Leu Leu Phe Val Va l Val A la Val Phe Leu Phe Leu Val Tyr Gin 

420 425 430 

GCT ATG GAA ACC AAC CAA GTA AAT CCC TTC TCT AAT TTT CTT CAT GTT 

Ala Met Glu Thr Asn Gin Val Asn Pro Phe Ser Asn Phe Leu His Val 
435 440 445 

GAC CCT AGA AAA TCC AAC TGAATGGTAT CTCTTTGGCA CGTTCAACTT 
Asp Pro Arg Lys Ser Asn 
450 

GGTCTCCTAT TTTCAATAAC TGTTGAAAAA CATTTGTGTA CACTTGTTGA CTCCAAGAAC 
TAAAAATAAT GTGATTTCGC CTCAATAAAT GTAGTATTTC ATTGAAAAGC AAAC 



SUBSTITUTE SHEET (RULE 26} 



WO 95/17205 



PC17US94/I4356 



8/33 
FIGURE 3A 

DNA and Amino Acid Sequences of TP y 

CCCTGCTACC AAGGCCCAGC TATGGCCCCA GGGTTGAAAA GTTATGAGGG TCAGGGGTCT 
TTTGTGTCCG GGTCTGGCTT GGCTTTGTGT CCGCGAGTTT TTGTTCCGCT CCGCAGCGCT 
CTTCCCGGGC AGGAGCCGTG AGGCTCGGAG GCGGCAGCGC GGTCCCCGGC CAGGAGCAAG 
CGCGCCGGCG TGAGCGGCGG CGG CAAAGGC TGTGGGGAGG GGGCTTCGCA GATCCCCGAG 
-1 +1 

ATG CCG GAG TTC CTG GAA GAC CCC TCG GTC CTG ACA AAA GAC AAG TTG 
Met Pro Glu Phe Leu Glu Asp Pro Ser Val Leu Thr Lys Asp Lys Leu 
-1 +1 5 10 15 

AAG AGT GAG TTG GTC GCC AAC AAT GTG ACG CTG CCG GCC GGG GAG CAG 
Lys Ser Glu Leu Val Ala Asn Asn Val Thr Leu Pro Ala Gly Glu Gin 

20 25 30 

CGC AAA GAC GTG TAC GTC CAG CTC TAG CTG CAG CAC CTC ACG GCT CGC 
Arg Lys Asp Val Tyr Val Gin Leu Tyr Leu Gin His Leu Thr Ala Arg 
35 40 45 

AAC CGG CCG CCG CTC CCC GCC GGC ACC AAC AGC AAG GGG CCC CCG GAC 
Asn Arg Pro Pro Leu Pro Ala Gly Thr Asn Ser Lys Gly Pro Pro Asp 
50 55 60 

TTC TCC AGT GAC GAA GAG CGC GAG CCC ACC CCG GTC CTC GGC TCT GGG 
Phe Ser Ser Asp Glu Glu Arg Glu Pro Thr Pro Val Leu Gly Ser Gly 
65 70 75 

GCC GCC GCC GCG GGC CGG AGC CGA GCA GCC GTC GGC AGG AAA GCC ACA 
Ala Ala Ala Ala Gly Arg Ser Arg Ala Ala Val Gly Arg Lys Ala Thr 
80 85 90 95 

AAA AAA ACT GAT AAA CCC AGA CAA GAA GAT AAA GAT GAT CTA GAT GTA 
Lys Lys Thr Asp Lys Pro Arg Gin Glu Asp Lys Asp Asp Leu Asp Val 
100 105 110 

ACA GAG CTC ACT AAT GAA GAT CTT TTG GAT CAG CTT GTG AAA TAC GGA 
Thr Glu Leu Thr Asn Glu Asp Leu Leu Asp Gin Leu Val Lys Tyr Gly 
115 120 125 

GTG AAT CCT GGT CCT ATT GTG GGA ACA ACC AGG AAG CTA TAT GAG AAA 
Val Asn Pro Gly Pro lie Val Gly Thr Thr Arg Lys Leu Tyr Glu Lys 
130 135 140 
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FIGURE 3B 

AAG CTT TTG AAA CTG AGG GAA CAA GGA ACA GAA TCA AGA TCT TCT ACT 
Lys Leu Leu Lys Leu Arg Glu Gin Gly Thr Glu Ser Arg Ser Ser Thr 
145 150 155 

CCT CTG CCA ACA ATT TCT TCT TCA GCA GAA AAT ACA AGG CAG AAT GGA 
Pro Leu Pro Thr lie Ser Ser Ser Ala Glu Asn Thr Arg Gin Asn Gly 
160 165 170 175 

AGT AAT GAT TCT GAC AGA TAG AGT GAC AAT GAA GAA GAC TCT AAA ATA 

Ser Asn Asp Ser Asp Arg Tyr Ser Asp Asn Glu Glu Asp Ser Lys lie 
180 185 * 190 

C 

GAG CTT AAG CTT GAG AAG AGA GAA CCA CTA AAG GGC AGA GCA AAG ACT 

Glu Leu Lys Leu Glu Lys Arg Glu Pro Leu Lys Gly Arg Ala Lys Thr 

195 200 205 

CCA GTA ACA CTC AAG CAA AGA AGA GTT GAG CAC AAT CAG GTG GGA GAA 
Pro Val Thr Leu Lys Gin Arg Arg Val Glu His Asn Gin Val Gly Glu 
210 215 220 

AAA ACA GAG GAA AGA AGA GTA GAA AGG GAT ATT CTT AAG GAA ATG TTC 
Lys Thr Glu Glu Arg Arg Val Glu Arg Asp lie Leu Lys Glu Met Phe 
225 230 235 

CCC TAT GAA GCA TCT ACA CCA ACA GGA ATT AGT GCT AGT TGC CGC AGA 
Pro Tyr Glu Ala Ser Thr Pro Thr Gly lie Ser Ala Ser Cys Arg Arg 
240 245 250 255 

CCA ATC AAA GGG GCT GCA GGC CGG CCA TTA GAA CTC AGT GAT TTC AGG 
Pro lie Lys Gly Ala Ala Gly Arg Pro Leu Glu Leu Ser Asp Phe Arg 
260 265 270 

ATG GAG GAG TCT TTT TCA TCT AAA TAT GTT CCT AAG TAT GTT CCC TTG 
Met Glu Glu Ser Phe Ser Ser Lys Tyr Val Pro Lys Tyr Val Pro Leu 
275 280 285 

GCA GAT GTC AAG TCA GAA AAG ACA AAA AAG GGA CGC TCC ATT CCC GTA 
Ala Asp Val Lys Ser Glu Lys Thr Lys Lys Gly Arg Ser lie Pro Val 
290 295 300 

TGG ATA AAA ATT TTG CTG TTT GTT GTT GTG GCA GTT TTT TTG TTT TTG 
Tr p lie Lvs He Leu Leu Phe Val Val Val Ala Val P he Leu Phe Leu 
305 310 315 

GTC TAT CAA GCT ATG GAA ACC AAC CAA GTA AAT CCC TTC TCT AAT TTT 
Val Tyr Gin Ala Met Glu Thr Asn Gin Val Asn Pro Phe Ser Asn Phe 
320 325 330 335 

CTT CAT GTT GAC CCT AGA AAA TCC AAC TGA ATGGTAT CTCTTTGGCA 
Leu His Val Asp Pro Arg Lys Ser Asn 

340 
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CGTTCAACTT GGTCTCCTAT 
CTCCAAGAAC TAAAAATAAT 
AAACAAAATA TATATAAATG 
ATCACTTTGT GCCATATGAA 
TTTTAATGTG GGCATCTTAT 
GAAACGAAGG GTGAAACATG 
CTTACAGAAA AGATTTTAAG 
ATAATGAAAT TCAGTAAGAG 
TTTGATGGTG TTTATGAGGA 
TAATGTTGTT GTAGCCCTAT 
TTCAGTAAGA CCCATTTACA 
ATATTTTAGA GAATTGTTGG 
AGGCTATAAT TGGAAATTTG 
TTTGCTACCA AAATATGTTT 
TAAACACTGG TCTTATGTTT 
AATTTTGCCG AGCTTTTTTT 
AAGGGCCGGG CGCGGTGGCT 
GGATCACGAG GTCAGGAGAT 
TAAAAAAAAA AAAAAAA 
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FIGURE 3C 
TTTCAATAAC TGTTGAAAAA 
GTGATTTCGC CTCAATAAAT 
GACTTCATTA AAATGTTTTT 
TAATCTTTTT TAGCTCTGGA 
TTCATTTTTG AAAAAATGTA 
GTAGTATAAT GTGAAGCTAC 
AATTATTCTC TGCTGAATAA 
GAAAAGTAAC TTGGTTGTAC 
AAAGTACAGC AATAATCTCT 
CATACTCACT TTTTAAGACA 
TACAGTAGAT TTTTAGCAGA 
CTAGCTGTAC ATGTTTTGAA 
TATTTTTTAT TTACAGCAAA 
TAGATAAGTG TGTGTATGTT 
CATTTGGATT CATTATTGCA 
GCCCTATATT TCCCAGCATA 
TACGCCTGTA ATCCCAGCAC 
CGGGACCATC CTGGCCAACA 



CATTTGTGTA CACTTGTTGA 
GTAGTATTTC ATTGAAAAGC 
GAACTTTGGA CTAGTAGGAG 
ACTTTTTGTA GGCTTTATTT 
TATGTTTTTT GTGTATTTGG 
ACATTTAAAT ACTTAGAATT 
AAACTGCAAA TATGTGAAAC 
TTTTTGTAAC TGCAACAAAG 
TCTGTAACCT TTATTAATAG 
CAGTATCATG AAAGTCCTAT 
GATCTTTTAG TGTAACATAC 
AAGCTGTTTA GCTAGCTATA 
ACATTTATTC AGTCATCCAG 
TGTTTAGAAG TTAGAAATTG 
TTGTCTTGTT ACCAGAAACA 
ATTTGATTAG AAAGTACAAA 
TTTGGGAGGC CAGGGCGGGT 
TGGTGAAACC CCGTCTCTAC 
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FIG. 4A 
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GAATTCAGAT AGAATGTAGA 
TAATTTCTGG AAAGTGACCA 
ATTTAGCCAG AAGATATGAA 
GGAAGTAGCA ATCAGGACCA 
AGGGTATGCA GAATCTGAAA 
CAAGTAACCA CCTGTAGCTA 
CCCAGG AAAC ATTAATTTCA 
TTACCAGTCA TTGCAAAGTC 
AAAAACATTT GCCTATGTGT 
AAGGGAT CAT AAAAACCTAC 
TATTGAGCTT AATACCATTG 
AGTTACCTCT AGGAAAATAA 
AAGTGGCTTA AATCATAGGA 
CCTAGCAATG TAGAAAGCAG 
GTTTGGAGCT CAGATTCTGG 
TTT C TAG AAA GAAAGCATCT 
CAGTGATACT AATTTCCAGG 
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FIGURE 6A 

CAAGAGGGAT GGTGAGGAAA 

GGAGGCTCGT GGATGATGGC 

CTTCAAATAT TTTTGAAGCA 

AAGAGACCAC CCAAACTCAA 

AGTGTAATCT CAATTTTGGA 

ATGTCCAGGG ATTAAAAAAA 

GGCGTATCTA GAATGCAGTT 

ATGGTCCTTT GCCTCTCAGC 

CCAGGGAAGC TGTGAGGACA 

CTAACAACTT GCTAATTAAA 

CTTAAATGTA TGTGAATACT 

TTATCACTAA AAGAAATAAT 

AACATTTTTA GTGAAGGCGT 

GCAAACATTT AAAAAAAATT 

GGAAATGATT AACACACCCT 

TTAAACATAC ATATT CAT C A 

CAAACATTTT AATTGCAGTC 



ACCTACGGCA AGCAGCTCTA 

60 

AGGAAGGAGA GTAGAAAGTG 

120 

GAAAAGAGCA GAAATGGTTT 

180 

TAGCCCAACC TCTTAGCCTT 

240 

GCTCTACAGC AGTAAATCTG 

300 

AGATAATGAA AATGTTTTTG 

360 

CTTGCATTAT ACGTACTGGG 

420 

TCAGTTCCCC TTCTGAAGAT 

480 

AAAAACCAAG CAACTTTTAC 

540 

ACCTGATTTT TAATTTGCAT 

600 

GAGATTTTTA TAAAGGAATT 

660 

ATCGCAATTT GAATAAAAGT 

720 

TTGTTTTAAA TGTATTCTAA 

780 

TAACCAGTTT CTAAAACATA 

840 

ACATCCAAGG TCTCCTTTCC 

900 

GAAATACAAA TATTTGT CAT 

960 

AATGTATTAG ATT CT AC C AG 

1020 
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GTTTTAATTT GGATCGGTAA 
TAGTCTTCCT TTAGGATTAT 
CAAGGGTTCT CCTTCGTTTT 
ATCCAAAGCC AGAGACGGTT 
GCACTGTTCA AATTAAACAA 
ATAGTACACT TTTGTCACAA 
TTCCCCTGGG TTTCTGGAAC 
CCCCCTCCCA AAGTAAATCA 
TTTCTTCCCC TTC AGGG CTA 
CAGTGTCCCG GGAGGAAGAA 
GCGTTCGTAG GCGATCGACC 
GAAACGGCGC ACAAAAGCAG 
GAAAACGAAA ACAGAAGCGG 
CGTTTCTACC TCCTCTCGCT 
AATCACCGCC GCGCTTCCTC 
GCGAAGCAGG CTGCTCGCCT 
AGCTTGGCCG CAGTTGGTTC 
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FIGURE 6B 
TACGGGTTTG ATTAGGGTTC 

GTACCATCTG TT ATTTT AAA 

GTT AT CAAAC GTTAGGTTTA 

TCAACAAAAA TGCAAGCGCG 

TTTAGACATG CCCCAACTAT 

ATTCAAGGGT AATTTAAGTG 

AAAAGAAGCG TTCGCGAGGA 

AATCAAGGAA TATGAGTGCC 

GCGTTTGGGG AAGGCAAGGC 

AGGCCCAGCC AAGGGTCCTC 

CCAGAGACGA AAGCTGCTTC 

TACGACCTGT CCCTTATCGG 

GCCGGGAGCC TCGGCTCCCG 

CAGCGCGGCG G CTAATGG AA 

CCGTCGCCCG CCAATGGCGG 

CCTGCCTGTA GTGTGTGGGC 

GTAGTTCGGC TCTGGGGTCT 



TAGGCCTAAT TATAGGTCAC 

1080 

TTGACCATTG AGGGGGTTCC 

1140 

GGATTCTTGC GGGTGGTGGG 

1200 

AATTTGTCTT GCGTCTGAAC 

1260 

GACACTAAGA AGTGAATGGT 

1320 

CCCGATGGTA GAGGTCTGGC 

1380 

GAGGGGTAAC TCCCCGCCCT 

1440 

TGCAGACAAG CCTCGCTTCT 

1500 

TGCGGCTACT CTTGGAGCTT 

1560 

ACACTGGCGT GGAATTCGGC 

1620 

TCAAGCTGGG GGAGGGAGAG 

1680 

CGTCTAAGGG GAAGGGTGGA 

1740 

CCCCAGCGCC TTTTAAACTG 

1800 

CCCGCGCGAG CCGTCTCGCC 

1860 

CGCGCGTTCT TGGGGCGTGG 

1920 

TGGGGTTGGT GCGAGCTTCC 

1980 

TTTGTGTCCG GGTCTGGCTT 

2040 
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FIGURE 6C 

GGCTTTGTGT CCGCGAGTTT TTGTTCCGCT CCGCAGCGCT CTTCCCGGGC- AGGAGCCGTG 

AGGCTCGGAG GCGGCAGCGC GGTCCCCGGC CAGGAGCAAG CGCGCCGGCG TGAGCGGCGG 

2160 

CGGCAAAGGC TGTGGGGAGG GGGCTTCGCA GATCCCCGAG ATGCCGGAGT T C CTGGAAG A 

222 0 

CCCCTCGGTC CTGACAAAAG ACAAGTTGAA GAGTGAGTTG GTCGCCAACA ATGTGACGCT 

2280 

GCCGGCCGCG GAGCAGCGCA AAGACGTGTA CGTCCAGCTC TACCTGCAGC ACCTCACGGC 
TCGCAACCGG CCGCCGCTCC CCGCCGGCAC CAACAGCAAG GGGCCCCCGG ACTTCTCCAG 

TGACGAAGAG CGCGAGCCCA CCCCGGTCCT CGGCTCTGGG GCCGCCGCCG CGGGCCGGAG 

2 4 60 

CCGAGCAGCC GTCGGCAGGG TAAGGACGCG GGGCCGGGGC TACAAAGGC 
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FIGURE 7 

GC'CTAATATT ACTATAAGCC AATTTGGTAG TGAGTTTGCA TGTTTATGTT ATACAGGTTA 

60 

TTATCTAGTA AGTGAACACT TTAATTCATA TGGAAATGAT TACTGGACTT TGTTTACAGA 

AAGCCACAAA AAAAACTGAT AAACCCAGAC AAGAAGATAA AGATGATCTA GATGTAACAG 

180 

AGCTCACTAA TGAAGATCTT TTGGATCAGC TTGTGAAATA CGGAGTGAAT CCTGGTCCTA. 

TTGTGGGTAA GTTGATAAAA TTTCAAATAC 

270 
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FIGURE 8 

TTTTTTTTAA GAACTTGAGT TTAGAAAAAT AAAGGGTGAA AATAGCTTAA AATGTTGCTG 

60 

AAGATAGTGT CTGAGCTGCA TCCTAAATGA AACAATACAG GTACTAAAGC AAGTTCTGCC 

120 

TTAATCCAGG AACAACCAGG AAGCTATATG AGAAAAAGCT TTTGAAACTG AGGGAACAAG 

180 

GAACAGAATC AAGATCTTCT ACTCCTCTGC CAACAATTTC TTCTTCAGCA GAAAATACAA 

240 

GGCAGAATGG AAGTAATGAT TCTGACAGAT ACAGTGACAA TGAAGAAGGT AAAATTTTAA 

300 

ATGATAGTT 
309 
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GGTAATGCTT AAACTTCCTG 
AGTGAAGTCC ACTAGGGATA 
TGGATTTTTT CAGGGTATTT 
CGAACTACAG GCAGCTAAGA 
TGTTGCCACA AACTTGCCTG 
GTTTATTTCA TGCAAGTCTA 
TCAGCCTGAA CACAGTGCCA 
AACCACCACT GGTTACTATA 
AATTCAACCA TTATGTCCTG 
AAGGAAAGCA CTAGAAGAGT 
AAT C AG AG AT TATGTCAATT 
TTCTAACTCT ATGCCCCCAC 
GTTT CAAGAA ACTGAATTCC 
AGTGGAGGAA AGGGATTCAG 
GATGTCTTCT TTTGCCAAAA 
GGCTAAGCAA TCACAGCATG 
ATCTATTTTA AAAGTAATTG 
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FIGURE 9A 
CCTCTTTTGC CTCTACAGGA 

TTGTTCCTTT TTCTGAACTT 

CTTTTCCTGA AATCTCCACC 

AAGTACATAC TTCTAAGGGA 

G C AGGGG AC A GTTGCAGAAG 

GCCATGATAG GTGTTTAGAG 

TGTTGGTCTC TACTGCAGCT 

AAGACATAGT AGAAAATATT 

AGAGGTCCCA TATTTCAGAT 

CTGAGAGCTC ACAACTAATT 

CTCTGTTGGT CCAGGGTGGG 

TGGATGTAGA AAACATACAG 

TGTCTCCTCC AAGAAAAGTC 

GTTCCTTTGT GGCATTTCAG 

CTGTTGTCTC TCATTCACTC 

ATAAAATAGA TGCCTCAGAA 

AAGAAGAATG GCAGCAAGTT 



AAGAAGAAAG AACACAAGAA 

60 



GGAACTACTC CCTCTGGTGG 

120 



CGTCCTCCTT TGGGCAGTAC 

180 

GACCTACCTA GGGAGCCTCT 

240 

TTAGCCTCTG AAAGGAATTT 

300 

AAAAGTTCTT CGTCATCTTC 

360 



TCTCCTTCAC TGATTAAAGA 

420 



TGCGGTAGAG AGAAAAGTGG 

480 



CAATCGCCTC TCTCCAGTAA 

540 

TCTCCGCCAC TTGCCCAGGC 

600 

GTAGGTAGTT TGCCTGGAAC 

660 



AAGAGAATTG ATCAGTCTAA 

720 



CCTAGACTGA GTGAGAAGTC 

780 



AACATACCTG GATCCGAACT 

840 



ACTACCTTAG GTCTAGAAGT 

900 

CTATCTTTTC CCTTCCATGA 

960 

GACAGGCAGC TGCCTTCACT 

1020 
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GGCATGCAAA TATCCAGTTT 
AGATGATGAA ATCCTAGGGT 
CTCCACTGAG TCTTGCAATC 
AG CAT C AG C A TTGCAGATTG 
CATTAGTGAA GCTGCACAGA 
GATTCTGAGC AAAACATATG 
GAAGATGGCT GCCCATACCA 
GGATTGCAAA ATTAATTTAG 
AACATTATTT GGAGGAGAAG 
AAATTAAGGA CAAAAAGACA 
TTCTGTTAAG GTTGTTTTAG 
CAT C AAATT A CAGTATAAAA 
TGCGTAGATA GGAAGCCTGG 
TAAGTTGTTT TCTGTTTCCT 
TTCTAATTAG GATATAAGGA 
ACCTAGGACA GAATTAAACA 
ACTTTTATCT CAGTATCTTT 
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FIGURE 9B 
CTTCCAGGGA GGCAACACAG 

TTATTTCTGA AGCCACTCCA 

AGCAGTTGGA CTTAGCACTC 

CAACTCACAC TGCCTTTGTA 

TTCTTAGCTC AGATCCTAGT 

ATGCAGCCTC ATATATTTGT 

TGGGAAATGC CACTGTAGGT 

CTTCTAAGAA TAAGCTGGCT 

TATGCAAAGT AATTAAAAAG 

TCTATCTTAT CTTTCAGGTA 

TTTCCAGATA GGGCTAATTA 

GTAATTGCCT GTGTAGAACT 

TACAAACAAT TTAACGCTTT 

GCTTTACTTA TGTTTTTACA 

GTATTTACTG TTCAATAGAA 

TTTGTTACAC ATTCAGAACA 

TCACGTTCCA TAACTTGTCC 



AT ATT AT C AG TTCCAAAAGT 

1080 

CTAGGAGGTA TTCAAGCAGC 

1140 

TGTAGAGCAT ATGAAGCTGC 

1200 

GCTAAGGCTA TGCAGGCAGA 

1260 

CGTACCCACC AAGCGCTTGG 

1320 

GAAGCTGCAT TTGATGAAGT 

1380 

CGTCGATACC TCTGGCTGAA 

1440 

TCCACTCCCT TTAAAGGTGG 

1500 

CGTGGAAATA AACACTAGTA 

1560 

CTTTATG CCA ACATTTTCTT 

1620 

CAAAATGTTA AGCTTCTACC 

1680 

ACTTGTCTTT TCTAAAGATT 

1740 

CTAGATCACA TATTAGTCTC 

1800 

ATTCTCCAAA ACTAAGAAAA 

1860 

TAATATGCAT CCTCCTTTAT 

1920 

GTGATGTTGT TCTTTTTGAT 

1980 

ATATTTTTGC TCATATTTTC 

2040 
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FIGURE 9C 

TTACTTTTCT TTGTTATTTA TTCATGTCTG CAACATCAAT CATAGTAGTC TAGATCAATG 

CAACTCAAAG CACCAGTCTA CAAACTGTTA CTTATCCACA GGCAAGATAA GCATGCACAA 

2160 

GAATTTAAAT CTAGAGATAC TTTTTAGGTC AATGACAGGA TTTGATTTTT TAGCAAAATT 

2220 

TTATTAATAG CTAAAGCAAT GTATTGATTT ACACTCTGAT GCAAGTAATT TATCTCTTCA 

2280 

TTGACTGGTA GCAACCAATT C ATGG AC C AG TACCATGGAC CACACTTTGA GAAACACTTC 
TTTGGATAAT AATAGATATC CTGGGATAGT GCATGTTCAC CATCTATTTT GTCAGATAAT 

GGGGCCTTTT AAAAAATAAT ACTTTGCTTT CAT GAT AT AT TGTATTTTGT GGAAAGTTAA 

2 4 60 

GTTTAGCAAT ATAGACTCTA AAAGCAAATT AAATTTTTTT AAGCCATAAG AAATTATACT 

2520 

' ATATCCCAGT ATCTGTATGT CTGTATAAAG CAGTGTATTA TCATGTTTTC ATTTCTGTGA 

2580 

TTGTAAGTTA AGAGTCTTAA CTGCAGAGGT ATTGTGGAAA GTAGTAGCCT TAAGCATAAT 

2 64 0 

AAAATATGGT CTCTTGGGTA CTCCCTCTGG CCATTACCAC ATTCTTAGAT TATATGTGTC 

2700 

CATCTTTGCA GCTTTCTGAG AGTAATTTTA TTTGTTGTCT TCTGAAATGT ACATGTATAC 

27 60 

ATGTACCTAC TGAGTGCTAT GTGATTTTTA AAAATGTATT ACTGTAGAAT GCTTCTGCAA 

2 820 

ATTCAATAAA GTTGTTAAAT TTGAACAGTG TTGTGTGGTC TCCAGAAACA TGTTGTTCTG 

2 8 80 

TGTGCTTTAT TCTTGGAGTT GCAAC AAGTT AAATATTTGT ATATGAACAC CCCTTTTCCA 

294 0 

TTTCATCATT GAAACTCACT TTGACATTTC AGTGGTATAA TTGAAAATAT TCTGATTATG 

3 000 

GTATGGTTTT TCTTCTGTTT GAGGACCATG TTTTTATTGA CTGTTGGAAT CTAAATTTAT 
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AAGAGGAATT AATCTGTAAC 
AATGGTTTTT TTTACTTTGT 
TAGATTTTAT ATTGCTGGGG 
AAGTGTTGGG AGATGGACTT 
ACTTGATAAA AATCACTGTT 
AACTAGTATT AACACTGGAA 
CACAAGTTCA GAGTTCATGT 
AAAGTAATCT AACTTAAATT 
TGCTATACCA TGATAAATGT 
CTCTTGTCCC CAAAGCTGGA 
CCAGGTTCAA GCAATTCTCC 
CACCAACCCT GGCTAATGTT 
GATGGTCTTA AGCTGACCTC 
ACAAGCGTGA GCCACCATGC 
TAGATCTTCA AATGATGATG 
AAATACATAT CCTTGCGTTG 
GGCTTCTTTT CAAAGAAATG 
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FIGURE 9D 
CAGATACCTT ATTCGTTTAA 

GCTTTCTAGT TCTTAGATTC 

GTTGTACCAA ACATAGATAC 

GAAATCTTAG TCCTAAATAA 

CGTTTTTGGG AACCTGCAAG 

AGTTAGCAAG ACACATATAG 

AAAACTTTAG AATTGACTTC 

TTTGGTAGTA GAAGTTTTAG 

CTACAAAAAG GGATTTTTTT 

GTGCAATGGT GTGATCTCAG 

TGCCTTAGCC TCCTGAGTAG 

TGTATTTTTA GTAGAGACGG 

AGGTGATCCG CCCACCTCGG 

CCTGCCAGGG ATGTTAACTA 

AATTTAAGAC TAGGAGACTG 

TTAGGTTAAT AACTATCATG 

TTATTTTTAT TATAAGGACT 



CCGATTTCTA TTCCACTACT 

3120 

TGAGTTAACA AAGCATAAAA 

3180 

ATTGACAGAT CCATTGATAG 

3240 

AGACTGAACT GTTTAAACTA 

3300 

TATTAAATAG ATTCTGTACT 

3360 

ATGTCTTGAC CACTTTTTCA 

3420 

CTTTCTGTCT CTTCAGTAAG 

3480 

AAATAACAAC TGACTAATTT 

3540 

TTTTTTGAAA TGAAGTTTCG 

3600 

CTCGCTGCAA CCTCCGCCTC 

3660 

CTGGGATTAC AGGTGTCTGC 

3720 

GGTTTCACCA TGTTGGCCAG 

3780 

CCTTCCAAAG TGCTGGGATT 

3840 

AAATCAGTCT TTTCAAAACC 

3900 

GAATATTGAA G C CTAT T AAA 

3960 

GTGACAAGTG TATAAGTATT 

4020 

GGGACGAGAA GTGACCTGTG 

4080 
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AtTGGTCGTA TTTTTCTGTG 
TATAACTACA TCAAAAAAGC 
GAAGCTTGGT ACTATATGGA 
TTAATTTAGT AAGTCAACCT 
GTAACAATTG TGTTATTTTA 
CTTTCGTGGG AGCATTTTGA 
AAGGAGAATA GGTAGGAAGA 
TGGGATAAAT AATGAAACAG 
AGGTATTACC AGTAGACTTG 
TAATATTAGG ATAACATCAT 
TTATTTC C AG ACTCTAAAAT 
GCAAAGACTC CAGTAACACT 
TTATTACCAC CGTGTACAGG 
TTTAATGGTT GACACCCAGA 
TACTAAAAAT CTAAACTTTG 
CAAATTTTTA ATGTCTCATT 
TATGGTGTTT TGAAAGAAAT 
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FIGURE 9E 
AACAAGATTC CTTCTTTACC 

TTTTATATTG TACTTATTAA 

TTTTTTTCAT TTTTAAACTT 

TTGCTGTACA GTAGTAGTAT 

AATGGCCTTT TTCCACATCT 

ACTCACTTCA TGTTCAAAGC 

AGTGGTTGTC AGATGAAAGG 

TTTTCAAGAC AAATTGCAGT 

TTTTTCACCT TTTAATGTGC 

CATTTATTGA GTGTTTCAGA 

AGAGCTCAAG CTTGAGAAGA 

CAAGCAAAGA AGAGTTGAGC 

TATAAATAAC CTCCTGACAA 

TTCCAGCACG CTCAATAAAC 

TGGGTTTTGT CAGTAGTGTA 

TGTGTTTGAG TCTGTGCTAG 

ATTTTAAAAG GTGGAAATAT 



TGAGTTGTAC CTAGGTTTTT 

4140 

TGTTATGGCA GTTACTTATA 

4200 

TCCCTTCTAT GTTCCAAATT 

4260 

ACTGTATGGA CAAAACAATG 

4320 

AAATTGTTCT TACTGAAAAG 

4380 

TATGAGTCCC AGATAAAGGA 

4440 

GTAAGGGAGA TAAGCAAAAA 

4500 

TAAACAATTT TGGACTAGTG 

4560 

CTAAAACCAG GGTTCCCGAT 

4620 

GAGTAATGGT TTCTCCAATG 

4680 

GAGAACCACT AAAGGGCAGA 

4740 

AC AAT C AGGT ATCTTTAGTT 

4 800 

CACTAATCCA TGTTTTAGCC 

4860 

TTAATTCTGT TGTTAAATGA 

4920 

GCCTGTGATT ACAGCAAAAG 

4980 

ATGTAGTTCA AAGCCAGTTA 

5040 

CTAGACACTT TTGATACAAT 

5100 
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TTCTTTAAAA GGCAATGGAA 
TGATTTTATG TTTGCCTGTC 
GGTCCTAAGA CTACAGTTTT 
GGAGAGGCGT TTCAGTTTTT 
GGTATAAATG AACAGAGAAC 
TATTGACTAA CCTAGTATTG 
CAGAGAACAA ATTATTGACT 
TTCCTAGAGT CATTTTCCTT 
ATCTTTTCCA TCTCCTATTT 
TCACCCGGGC TGGAGTGCAG 
CTCAAGCGAT TCTCCTGCCT 
CGCTCAGCTA ATTTTTGTAT 
TCTCAAACTC CTGACCTCAG 
AGGTGTGAGC CACCGCGCCC 
ATTCTTGATT CCTTAAGGCA 
ATTTCAGAAC GATTTGTTAG 
ATGAAGACTT TGTTAACATA 
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FIGURE 9F 
GGGTTTTATA TTTGTGTCTT 

TTGTCTTCTG CGATTCTTTC 
CTTTCCTTAT TTCAGATGAA 
CAAAAGGGAA ATGTAGGAAA 
AAATTATTGA CTAACCTAGT 
TTTACAGAGA ACAAATTATT 
AATGCAGCAT TGATTTGGCT 
ACCCCTGCTA TGTCTAGCTG 
GTCACTTTGT GTGTTTGTTT 
TGGCGTGATC CCATCTCATT 
CAGCCTCCCA AGTAGCTGGG 
TTTTAGTAGA GACGGGGTTT 
GTGATCCACC TGCATTGGCC 
AGCCTGTTTG TCGTTTTAAA 
AGTCAGTCTC TCTCTTCATT 
AAATGAGCTT TGTGA'CAAGA 
GAACCAAATA CTGGAATACA 



TGTCTCTAGA TTTCTGACTT 

5160 

CTAAACTCAG AAGCTAGTCT 

5220 

AATTTACCTT TTCTATTGTG 

5280 

CTAAGGAGAA AATAAGCATA 

5340 

ATTGTTTACA GAGAACAAAT 

5400 

GACTAACCTA GTATTGTTTA 

5460 

GATGCTTTAT AAGACAG CTA 

5520 

GATGATTTGT CTAGTTGGTT 

5580 

GTGACGGAGT TTTGCTCTTG 

5640 

GCAACCTCCA CCCTGCTGGG 

5700 

ATTACAGGCA CATGCCACCA 

5760 

CACCGTGTTG GCCAGGCTGG 

5820 

TCCGAAAGTG CTGGGATTAC 

5880 

ATCAAATCCT TAGAGGAATT 

5940 

TGATGTAGTT GATAAGTTGA 

6000 

ACATACAGAG CATTGAATGA 

6060 

TGTTTTATTG CCCTTTTATG 

6120 
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TAGTAGTCCT AACAAATAGC 

AGTGGCTCAC GCCTGTAATC 

TCCGGAGTTC GAGACGAGCC 

AATTAGCCAG GCATGGTGGT 

AGAATCGCTT GAACCTTGGA 

TAGCCTGGGC AACAAGAGCG 

TAAAGATGGT CTAAGGGATA 

GTCAAAGGAG TGTAGGTTTA 

TTTAAAACAT GTAGAAGAGT 

ACAGTTGTCA GTGTTTTACC 

TACTTTTTTT TTTTTTTTTT 

CAGTGGCACA ATCTCGGCTC 

CTTCAGCCTC CTGAGTAGCT 

AATTC 
6905 



24/33 
FIGURE 9G 
TTCAGGAGCA TGCTGAAGAA 

CCAGCACTTT GGGAGGCCGA 

TGACCAACAT GGAGAAACCC 

GCATGCCTGT AATCCTAGCT 

GGCGGAGGTT GCTGTGAGCC 

AAACTCTGTC TCAAAAATAA 

ATTGAGTTGG AGGAATCTAA 

GATGACAGGC AGAATTAGAC 

GGAGGGAAGA GTTTAATGAC 

AGTTTGGTTT CATCACCCCA 

TTTTGAGACA GAATCTCACT 

ACTTGCAACC TCCGCCTCCT 

GGGATTACAG GTTCCCGCCA 



TAAGGAAATA GGCCGGGCGC 

6180 

GGCGGGCGGA TCACCTGAGG 

6240 

TGTCTCTACT AAAAATACAA 

6300 

ACTCCGGAGG CTGAGGCAGG 

6360 

GAGATCGCGC CATTGCACTC 

6420 

ATAAATAAAT AAATAAATAA 

6480 

ACTGAGGAGC AGAATAAATA 

6540 

AGTGGCTTTA TTGCAGAAAA 

6600 

CCTCCAGTCA TAGATGTGCC 

6660 

CACCCTCCAG CTTTTTAAAT 

6720 

CTGTCGCCCA GGTTGGAGTG 

678 0 

GGGTTCAAGC AGTTCTCCTG 

6840 

CCATGCCCAG CTAGTTTTTG 

6900 
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FIGURE 10 

GGCCGTTATT AAAGTATTTG 

GAGCTATTCT CAAGCTGGAA 

TTTAGT 
146 



ATGTTAGAAT ATGACTTTTT 

60 

TAACTGAGAC TGAATGGACA 

120 
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FIGURE 11A 



GAATTCT C AG ACATACCCAG AAGAGCACCA AAGAAACCAT TGACAAGAGC TGAAGTTAAA 

60 



TGAATACAAT TTAGATCGAT GCTATCATTG ATCTTTCAAA GAGGAAATAT AAATATTTGT 

120 

TTGTTTTTTT TTTTTTTTTT TTTTTGGAGT GGGAGGGAGC AGAGCTTTCT TTATTGGGTG 

180 



GTGTGTGTAA ATATGCATAA ATTATTTTAA GGACACTTTT ATTAGTGAAA ACAGAAATTA 

240 



ACTAATATAG TTTGACATGC TAATACTATC CTCTTGCCAC TGTATCCCTC TTAAGTTTCA 

300 



GTTTTCAAGG GAGTGGTCAT TTCTGTTCTG CTTATAATAC GTTCTGTTTG TCATGGATGC 

360 



AAGTTTGAAA GTGCTGCTGG GCTAT GAGGG AAGCTACACT TTCTTTTGTT GACGGGTCAA 

420 



ATAAGAGACT TTAACATAAC TTCATGTGGT GCTGGAATGG GTTAACTCCT GTGTAATTAA 

480 



GCTTGAAAGT ACTAACTGCA TGGTGCATTT TAATTTGAAT GAATCTTTAC AAAAGGAGCC 

540 



GAAACATTTA TTATTTTGTT CTGGACATAA AGGTAAACAG TAAAACAAGC TAAACATTAT 

600 



ATTTTTTTTA ATTTGGGCAT ATTGCATTGA CTATAATTTT AAAACATATA GTAAAGTTTG 

660 



CTAAAGGTTC ATGTTAAGTA TCTTCTGTAT TTTATCTTTT ACTACAGAAT TCTTAATAAA 

720 



TGAAGAAACA GTATACTTTT TTTTTTTTGA AACGGAGTCT TGCTCTGTTG CCCAGGCTGG 

780 



AGTGCAGTGG CACGGTCTCA GCTCACTGCA ACCTCCGCCT CCTGGGTTCA AGCGATTCTC 

840 



CTGCCTCAGC CTCCTGAGTA GCTGGGATTA CGGGTGCCCG CCACCACGCC CAGCTAATTT 

900 



TTGTATTTTT AGTAGAGATG GGATTTCACT ATGTTGGTlCA GGCTGGTCTC GAACTCCTGA 

L 960 



CCTCAGGTGA TCCACCCGCC TCAGCCTCCC AAAGTGCTGG GATTACAGGC GTGAGCCACC 

J 1020 



GTACCCGGCC CAACAATATA CTCTTAATAG TAAAAAAAAA AATCAATGAG GGAATTAGCA 




1080 
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FIGURE 11B 

TfGCCATGTT AGATTACTGA ATTTTTTCCC CCCAGTTATT TAAATGTGTG GCAGTTTTTC 

1140 



ACTCCAATAA CCACATATAT AAGGTATCTT CAAGAAATTT GAAGAGAGCC TTGGAAGCAT 

1200 



GTGGATACCT AAATAATAAT TTAGAAATGG C AG CTGTAAA ATCTATGATA GATTTAAAGG 

1260 



CATTTTGTTC TCTAAAACTT ACTATTTATG TTTTAATTAT TGCATGTTTA CACTAAATTT 

1320 



TAACTTGTGG TTGTTTGTTT GTCTGTTTCT TATTAGGTGG GAGAAAAAAC AGAGGAAAGA 

1380 



AGAGTAGAAA GGGATATTCT TAAGGAAATG TTCCCCTATG AAGCATCTAC ACCAACAGGA 

1440 



ATTAGGTATT CAGATACATT TAAACAAGTA CTAGTGTATT CTAGTAGGGA ATCTTTATTT 

1500 



TTAATT CTTC ATCACAAAGT TACTGTACTT CTGCTCAGAA TTAAGCTGTT TTTTTGGAGC 

1560 



CAGGTACAAT GGTACATGCC TGTAGTCCCA GCTACTCAAG AGACTAAGGC AGGTGGATTG 

1620 



CTTAAGCCCA GGAGTTCGAG ACCAGTCTGG GCAACATAAT GAGACTCTGT CTACCCACCG 

1680 



ACTTTTAAAT AAATAAGTGA ACGAACTGTT CTTTTGGGAG CAGTTTAATT CCCCTAAGCC 

1740 



CACCAGTAAA TATTCTTGAA AGTTGAATGA GTTTCAAAAT TGCAATTGGT AAATGGAGGA 

1800 

TTTTAAATAA ATCACTTACT TACTACTTAA ATTCATATGT TAAAATCTAA ACGAAAATCT 

1860 



AGTGAGAAAT AGTTTTTCCC CCCTAAGTTT CTCTTTTCTA ATTGAAAAGG CTGTGTTTAT 

1920 



GTATCTTAAT TTTTAACTGA AAG AAA C AC A GCCAAAACTG TTTTGTTTTT CCTCTTTACT 

1980 



CCTAAACATG AG CTTTG AAG AGCCTTTAAG AAGGCAGGTT TAGTGACTAT TTATCAGTGT 

2040 



TCTGAATATT CTTTTCTAAT AAAGGATGCA TCGAACTTTT TTTTTTTTTT TTGAGACGGA 

2100 



GTTTTGCTCT GTTGCCCAGG CTGGAGTACA GTGGCGTGAT CTCGGCTCAC TACAACCTCC 

2160 
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FIGURE 11C 
TCTCCTGCCT CAGCGTCCTG 

ATTTTTGTAT TTTTGTAGAG 
TGACCTCAGG TGATCTGCCC 
ACCATGCCCA GCCCATCTAA 
ACTGACCTCA CTGCTTTATA 
AATTAGGAGT GGCAATGACA 
ATTTTTCTTT TCCTCCTTTC 
CTGCAGGCCG GCCATTAGAA 
ATGTTCCTAA GTATGTTCCC 
CCATTCCCGT ATGGATAAAA 
TCTATCAAGC TATGGAAACC 
CTAGAAAATC CAACTGAATG 
TAACTGTTGA AAAACATTTG 
TCGCCTCAAT AAATGTAGTA 
ATTAAAATGT TTTGAACTTT 
TTTTAGCTCT GGAACTTTTT 
TTGAAAAAAT GTATATGTTT 
AATGTGAAGC TACACATTTA 
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AGTAG CTGGG ATTACAGGCA 

2220 

ACAGAGTTTC ACCATGTTGG 

2280 

GCCTTGGCCT CCCAAAGTGC 

2340 

CTTTTAAAAT AGAAAATATG 

2400 

TGTGTCAATT TTCTACTGTT 

2460 

TTTTATGTTA TTTCTCTAAA 

2520 

ACTCCCAACA GTGCTAGTTG 

2580 

CTCAGTGATT TCAGGATGGA 

2640 

TTGGCAGATG TCAAGTCAGA 

2700 

ATTTTGCTGT TTGTTGTTGT 

2760 

AACCAAGTAA ATCCCTTCTC 

2820 

GTATCTCTTT GGCACGTTCA 

2880 

TGTACACTTG TTGACTCCAA 

2940 

TTTCATTGAA AAGCAAACAA 

3000 

GGACTAGTAG GAGATCACTT 

3060 



GTAGGCTTTA TTTTTTTAAT 

3120 

TTTGTGTATT TGGGAAACGA 

3180 

AATACTTAGA ATTC 
3234 

26) 
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FIGURE 12A 

GAATTCAGAT AGAATGTAGA CAAGAGGGAT GGTGAGGAAA ACCTACGGCA AGCAGCTCTA 

-1828 

TAATTTCTGG AAAGTGACCA GGAGGCTCGT GGATGATGGC AGGAAGGAGA GTAGAAAGTG 

-1768 

ATTTAGCCAG AAGATATGAA CTTCAAATAT TTTTGAAGCA GAAAAGAGCA GAAATGGTTT 

-1708 

GGAAGTAGCA ATCAGGACCA AAGAGACCAC CCAAACTCAA TAGCCCAACC TCTTAG CCTT 

-1648 

AGGGTATGCA GAATCTGAAA AGTGTAATCT CAATTTTGGA GCTCTACAGC AGTAAATCTG 

-1588 

CAAGTAACCA CCTGTAGCTA ATGTCCAGGG ATTAAAAAAA AGATAATGAA AATGTTTTTG 

-1528 

CCCAGGAAAC ATTAATTTCA GGCGTATCTA GAATGCAGTT CTTGCATTAT ACGTACTGGG 

-1468 

TTACCAGTCA TTGCAAAGTC ATGGTCCTTT GCCTCTCAGC TCAGTTCCCC TTCTGAAGAT 

-1408 

AAAAACATTT GCCTATGTGT CCAGGGAAGC TGTGAGGACA AAAAACCAAG CAACTTTTAC 

-1348 

OTF 

AAGGGATCAT AAAAACCTAC CTAACAACTT GCTAATTAAA ACCTGATTTT T AATTTGCAT 

-1288 

TATTGAGCTT AATACCATTG CTTAAATGTA TGTGAATACT GAGATTTTTA TAAAGGAATT 

-1228 

AGTTACCTCT AGGAAAATAA TT AT C ACT AA AAGAAATAAT ATCGCAATTT GAATAAAAGT 

-1168 

AAGTCGCTTA AATCATAGGA AACATTTTTA GTGAAGGCGT TTGTTTTAAA TGTATTCTAA 

-1108 

CCTAGCAATG TAGAAAGCAG GCAAACATTT AAAAAAAATT TAACCAGTTT CTAAAACATA 

-1048 

GTTTGGAGCT CAGATTCTGG GGAAATGATT AACACACCCT ACATCCAAGG TCTCCTTTCC 

- 988 

TTTCTAGAAA GAAAGCATCT TTAAACATAC ATATTCATCA GAAATACAAA TATTTGTCAT 

- 928 
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FIGURE 12B 



CAGTGATACT AATTTCCAGG CAAACATTTT AATTGCAGTC AATGTATTAG ATTCTACCAG 

- 868 



GTTTTAATTT GGATCGGTAA TACGGGTTTG ATTAGGGTTC TAGGCCTAAT TATAGGTCAC 

- 808 



TAGTCTTCCT TTAGGATTAT GTACCATCTG TTATTTTAAA TTGACCATTG AGGGGGTTCC 

- 748 



CAAGGGTTCT CCTTCGTTTT GTTATCAAAC GTTAGGTTTA GGATTCTTGC GGGTGGTGGG 

- 688 



ATCCAAAGCC AGAGACGGTT TCAACAAAAA TGCAAGCGCG AATTTGTCTT GCGTCTGAAC 

- 628 



GCACTGTTCA AATTAAACAA TTTAGACATG CCCCAACTAT GACACTAAGA AGTGAATGGT 

- 568 



ATAGTACACT TTTGTCACAA ATTCAAGGGT AATTTAAGTG CCCGATGGTA GAGGT CTGGC 

~ 508 



Spl H4 

TTCCCCTGGG TTTCTGGAAC AAAAGAAGCG TTCGCGAGGA GAGGGGTAAC TC CCCGCCCT 

- 448 



TF-1 

CCCCCTCCCA AAGTAAATCA AATCAAGGAA TATGAGTGCC TGCAGACAAG CCTCGCTTCT 

- 388 



TTTCTTCCCC TTCAGGGCTA GCGTTTGGGG AAGGCAAGGC TGCGGCTACT CTTGGAGCTT 

- 328 



CAGTGTCCCG GGAGGAAGAA AGGCCCAGCC AAGGGTCCTC AC ACT GG CGT GGAATTCGGC 

- 268 



H4TF-1 

GCGTTCGTAG GCGATCGACC C C AGAGACG A AAGCTGCTTC TCAAGCT GGG GGAGGGA GAG 

- 208 



GAAACGGCGC ACAAAAGCAG TACGACCTGT CCCTTATCGG CGTCTAAGGG GAAGGGTGGA 

- 148 



ISRE-like Spl-like Spl TATA-liJce 

GAAAACGAAA ACAGAAGCGG GCCGGGAGCC TCGGCTCCCG CCCCAGCGCC T TTTAAA CTG 

- 88 
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FIGURE 12C 

CTF 

CGTTTCTACC TCCTCTCGCT CAGCGCGGCG GCTAATGGAA CCCGCGCGAG CCGTCTCGCC 

- 28 

* Spl CTF * 

AATCACCGCC GCGCTTCCTC CCGTCGC CCG CCAAT GGCGG CGCGCGTTCT TGGGGCGTGG 

+1 33 

GCGAAGCAGG CTGCTCGCCT CCTGCCTGTA GTGTGTGGGC TGGGGTTGGT GCGAGCTTCC 

93 



AGCTTGGCCG CAGTTGGTTC GTAGTTCGGC TCTGGGGTCT TTTGTGTCCG GGTCTGGCTT 

153 



GGCTTTGTGT CCGCGAGTTT TTGTTCCGCT CCGCAGCGCT CTTCCCGGGC AGGAGCCGTG 

213 

Primer 

AGGCTCGGAG GCGGCAGCGC GGT CCCCGGC CAGGAGCAAG CGC GCCGGCG TGAGCGGCGG 

273 

MetPro 

CGGCAAAGGC TGTGGGGAGG GGGCTTCGCA GATCCCCGAG ATGCCG 

319 
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